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Save Time^ No More Retyping! 



Congratulations on acquiring Readiris. This software package will undoubt- 
edly be of great help in recapturing your texts, tables and graphics. 

As efficient as computers are, you have to key in your information first. If you 
have ever retyped a 15 page report or a large table of figures, you know how 
tedious and time-consuming it can be. Use this state-of-the-art OCR package to 
automatically enter text in your applications and you'll acquire an unprecedented 
level of efficiency and comfort! 

Scan a printed or typed document, indicate the zones of interest - or have the 
system detect them for you - and execute the character recognition. Documents 
composed of many pages are processed from start to finish in a single effort. A 
few mouse clicks beat long hours of work as Readiris converts your paper docu- 
ments into editable computer files: it's up to 40 times faster than manual retyping. 

The wizard guides you through the OCR process comfortably: answer a few 
simple questions and you'll obtain quick and easy results with Readiris. You can 
send the reading results directly to your wordprocessor and spreadsheet. To rec- 
ognize faxes and convert PDF documents, you can drag the image files from the 
Windows Explorer to the Readiris application window. Or right-click on an image 
to send it prompty to Readiris. 

Readiris recognizes tabular data and recreates them as worksheets or as table 
objects inside your wordprocessor; your numeric data are immediately ready for 
further processing. 

Based on the Connectionist technology from I.R.I. S., Readiris represents the 
best OCR has to offer. Font-independant feature extraction is complemented by 
self-learning techniques derived from a proprietary neural network. The system 
can learn new characters through context analysis: linguistic knowledge about 
syllables and words improves the OCR performance. 

Readiris supports up to 107 languages: all American and European languages 
are supported, including the Central-European languages, the Baltic languages, 
Greek and the Cyrillic ("Russian") languages. (Optionally, you can read four 



Asian languages - Japanese, Simplified and Traditional Chinese and Korean.) 
Readiris even copes with mixed alphabets: the software detects "Western" words 
that pop up in Greek, Cyrillic and Asian documents - many untranscrible proper 
names, brand names etc. are written using the Western symbols. 

Readiris uses linguistics duhngthQ recognition phase, not after it. As a direct 
result, Readiris recognizes documents of all kinds with top accuracy, including 
low-quality documents, faxes and dot matrix printouts. It copes beautifully with 
badly scanned and copied documents containing too light or dark font shapes. 
Joined characters ("ligatures") are resolved and fragmented forms, such as dot 
matrix symbols, are recomposed. 

User verification in pop-up style not only flags doubtftil characters but also 
increases the system's precision. All solutions confirmed by the user are memo- 
rized, increasing speed and confidence as you go along. Using Readiris means 
rendering it more intelligent each time! This powerftil learning tool allows you to 
train Readiris on special characters such as mathematic symbols and dingbats 
but also to handle distorted fonts as you will find in real documents. 

To increase your productivity ftirther, Readiris not only recognizes your texts, 
but can formatthQm for you as well! Make use of "autoformatting" and Readiris 
recreates a facsimile copy of the scanned document: the word, paragraph and 
page formatting of the original document are retained. 

Similar typefaces are used, the point sizes and typestyles as used in the source 
document are maintained across the recognition. The placement of columns, text 
blocks and graphics follows your original documents. And as Readiris supports 
greyscale and color scanning effortlessly, you can recapture any graphics - be 
they lineart, black-and-white photos or color illustrations. When a document con- 
tains tables, Readiris reorganizes them in real cells and recreates the cell borders 
of the original tables. 

In other words, Readiris allows you to archive a true copy of your documents, 
be it editable and compact text files instead of scanned images! Various levels of 
formatting are available, the choice is up to the user. 
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Readiris supports a wide range of popular scanners: numerous flatbed scan- 
ners, sheetfed scanners, "all-in-one" devices or "MFPs" ("multifunctional pe- 
ripherals") and digital cameras can be used. Readiris also supports the Twain 
scanning standard and some scanning platforms. 
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Credits and Copyrights 



The Readiris software is designed and developed by I.R.I. S. OCR, 
Connectionist, AutoFormat and Linguistic technology by I.R.I. S. I.R.I. S. detains 
the copyrights to the Readiris software, the OCR technology, the linguistic tech- 
nology, the on-line help system and this manual. 

AutoFormat, Cardiris, Connectionist, the I.R.I. S. Linguistic Technology, the 
I.R.I. S. logo and Readiris are trademarks of I.R.I. S. 

XML parser developed by Apache. This product includes software developed 
by the Apache Software Foundation (www.apache.org). 

Acrobat and Reader are (registered) trademarks of Adobe. AsianB ridge is a 
trademark of TwinB ridge. AsianSuite is a trademark of UnionWay. Excel, Win- 
dows and Word are registered trademarks of Microsoft. Intel is a registered 
trademark of Intel. 
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Chapter 1 

Installation 

This chapter discusses the system requirements and installation of the Readiris 
software. 

System Requirements 



This is the minimal system configuration required to use Readiris: 

□ a 486 based Intel PC or compatible. A Pentium based PC is recom- 
mended. 

□ 32 MB RAM. 64 MB RAM is recommended to process greyscale and 
color images. 

□ 110 MB free disk space. 95 MB of disk space suffices when you leave 
the sample files on the CD-ROM. 

□ the Windows XP, Windows ME, Windows 2000, Windows 98 or Win- 
dows NT 4.0 operating system. 

Note that some scanner drivers may not work under the latest Windows 
version(s). Refer to the documentation supplied with your scanner to see which 
platforms are supported. 

Installing the Readiris Software 



The Readiris software is delivered exclusively on an autorunning CD-ROM. 
To install, simply insert the CD-ROM in your CD-ROM drive and wait for the 
installation program to start running. Follow the on-screen instructions. 



Should the installation not begin to run when the CD-ROM is inserted in your 
CD-ROM drive, run the setup program MENU.EXE to install the software. 

Users of Windows XP, Windows 2000 and Windows NT must ensure that 
they have the necessary access rights - contact the system administrator if 
necessary. 

Some installation options are offered. Be sure to install the linguistic data- 
bases of all languages you intend to read. By default, all lexicons are installed. 
You are recommended to install the sample images which are used in the tuto- 
rials of this manual. 



InstdLIShieLd Wizard 



Select Components 

Choose the components Setup will install. 




Select the components you want to install clear the components you do not want to install. 





50G89KI 


2j Sample Images 


17117K 


^ Electronic Manual 


3327 K 


^ Adobe Reader 


G431 K 


Space Required on C: 


99398 K 



Space Available on 
InstallShield 



■ Description 

Includes the linguistic 
databases. Install the lexicon 
of all languages you intend to 
recognize. 



Change.., 



0 KB 



< Back 



Next > 



Cancel 



Similarly, install the Adobe Reader software required to access the software 
documentation, should this be necessary. The electronic manual is by default 
copied to your hard disk. You can also leave it on the CD-ROM. 
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The submenu "I.R.I. S. Applications - Readiris" under the "Programs" menu is 
created automatically by the installation program. 







1 @) IRISPen > 






^ I.R.I.S. on the Internet 




[y Readiris 




1^ Uninstall Readiris ^ 




1 0 User's Manual 



The same holds for a shortcut to Readiris on the Windows desktop. As a 
result, you are able to start Readiris directly from your desktop. 




Uninstalling the Readiris Software 



There are only two correct ways of uninstalling Readiris: using the Readiris 
"uninstall" program and using the Windows (un)install wizard. You are strongly 
recommended notto uninstall Readiris or its software modules by manually eras- 
ing the program files. 

Readiris ^^uninstalP^ program 

Select "Uninstall Readiris" under the submenu "I.R.I. S. Applications - Readiris" 
to start the Readiris "uninstall" program and ft)llow the on-screen instructions. 



Windows (un)install wizard 

Execute the ft)llowing steps to make use of the Windows (un)install wizard. 

□ Click "Settings" under the "Start" menu of Windows and go to the "Con- 
trol Panel". 

□ Click the icon "Add/Remove Programs" under the control panel. 
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£ft Add or Remove Programs 



change or 
Remove 
Programs 



Add New 
Programs 

& 

Add/Remove 

Windows 
Components 



Currently installed programs; 


Sort by; |Name 






^ Adobe Acrobat 


Size 


78,35MB 


' J. 


^ Cardiris 


Size 


35,66MB 




1^ IRISPen 


Size 


2,74MB 




^jj Microsoft Office Professional Edition 2003 


Size 


174,00MB 




[?7 Readiris 


Size 
Used 


99,51MB. 
frequently 




To change this program or remove it from your computerj 


ilick Change/Remove, ^^^^^^^H 


i 




_J 

m 



□ Follow the on-screen instructions to remove the Readiris software. 

Installing Software Options 

There's a single software option available ft)r the Readiris software: the "Asian 
OCR add-on". It allows you to read Japanese, Traditional Chinese, Simplified 
Chinese and Korean. This software is again delivered on an autorunning CD- 
ROM. 
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By installing this option, specific documentation becomes available that dis- 
cusses how you can recognize Asian documents. 



/ -7 



User 's Guide 







1 @) IRISPen ► 




g I.R.I.S. on the Internet 




^ Reading Asian documents ^ 




BO Readiris ^ 




H Unlnstall Readiris 




1 ^ User's Manual 



Installing Related Products 



Depending on the software bundle you acquired, Readiris may be supplied 
with an evaluation version of the related product Cardiris, a business card or- 
ganizer. 

If this free software package is included on your Readiris CD-ROM, it is also 
installed using the autorunning CD-ROM and following the on-screen instruc- 
tions. 

Contact I.R.I.S. to learn more about complementary software; the command 
"Contact I.R.I.S." under the "Help" menu of Readiris details in which ways you 
can get in touch with I.R.I.S. 
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Hide 



Back Forward Home 



Print 



Options 



Contents | Index | Search | 



[3 Welcome to the Readiris help 
^ Introducing OCR 
^ Recognizing Documents 
^ How to...? 
^ Reference Information 
^ Software Versions and Options 
^ Product Registration 
^ Product Support 
Oa I.R.I.S, 

l§ Register your Readiris licence 
l§ Readiris registration form 
|T| How to get product support 

Getting product support by e-mail 

1^ Getting in touch with I.R.I.S. by e-mail 
|T| How to acquire software and hardware 
® I.R.I.S. on the Internet 
® Readiris on the Internet 



How to Get in Touch 
with I.R.I.S. 

Head Office (Belgium) 

Phone: 32-10-45 13 64 
Fax; 32-10-45 34 43 

I.R.I.S. on the Internet 

I.R.I.S, home page: http://www.in5link,conn 
Readiris web site: http://www.readins.CQm 
On-line shop: http://sh0p.iri5link.com 
E-mail info: infogiinslink.com 
E-mail sales: sales@irislink.com 
E-mail support; supporttgiirislink.com . 
supp0rt@irisu5a.com 

USA Office 

Phone: 1-561-921-0847 / 800-477-4744 
Fax; 1-561-921-0854 



An application icon in the submenu "I.R.I.S. Applications - Readiris" under the 
"Programs" menu takes you directly to the I.R.I.S. home page. So does the 
Readiris startup screen and the command "I.R.I.S. on the Internet" under the 
"Help" menu of Readiris. 
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Iffi I.R.I.S. Applications V 


Cardiris ► 
@) IRI5Pen > 








1^ Readihs > 


I, R.I, 5, on the Internet . 




BJ Readiris ^ 
H Uninstall Readiris 
^ User's Manual 



Installed Files 



The installation program has created a folder where the Readiris files are 
located. Never try to uninstall Readiris or some of its modules by manually eras- 
ing the program files, use the Readiris "uninstall" program or the Windows 
(un)install wizard instead. See above. 

Read Me file and documentation 

README.HTM "Read Me" file (in HTML format) 
MANUAL.PDF User's manual (in Adobe Acrobat format) 

Scanner drivers 

Don't hesitate to contact your scanner manufacturer or its representative should 
problems with scanner drivers continue. Most manufacturers allow you to down- 
load the latest versions of the scanners drivers from their web site. 

Register to Vote! 



Don't forget to register your Readiris license! Doing so will allow us to keep 
you informed of future product developments and related I.R.I.S. products. The 
registration benefits, including free product support and special offers, are 
strictly limited to registered users. 
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You can register in many ways: by sending in your registration card or faxing 
its electronic counterpart, by calling I.R.I. S. during working hours and by filling 
out a registration form on the I.R.I. S. web site! 



1? Readiris help 



Hide 



^ ^ c m & 

Back Forward Home Print Optior^s 



Contents | Index [ Search | 



Welcome to the Readiris help 
^ Introducing OCR 
^ Recognising Documents 
^ Recognising Business Cards 
^ How to,,,? 
^ Reference Information 
^ Software Versions and Options 
^ Product Registration 



Register your Readiris license 



@ Readiris registration form 
^ Registering by e-mail 

^ Product Support 

^ I,R,I.S. 



Register Your Readiris 
License 

why you should register 

• Registering allows us to keep you informed of 
future product developments and related 
I.R.LS, products. 

• Registering entitles you to free product support 
and special offers. 

• Depending on the software bundlej you'll 
receive the softkey in return as may be needed 
to continue using Readiris after one month. 

How to...? 
Mail 

Send in your registration card, 
WWW 

Click here to access the Readiris registration form 
on the I.R.LS, web site. 



The Readiris registration wizard as you'll find under the menu "Register" 
of the Readiris software can guide you through the registration process comfort- 
ably 



/ - / / 



User 's Guide 



Readiris registration wizard 




Welcome to the Readiris registration wizard. 

It allows you to register your Readiris software licerise. 

Registeririg allows us to keep you informed of future 
product developments and related I.R.I.S. products. 

Registering entitles you to free product support and 
special offers. 



Next > 



Depending on the software version you acquired, you'll receive the softkey 
in return as may be needed to continue using the Readiris software after one 
month. 



Getting Product Support 



The command "Product Support" under the "Help" menu of Readiris details 
how you can get technical support. Please describe the phenomenon you experi- 
ence clearly and include all relevant data concerning Readiris, your scanner and 
your computer system. 



§? Rfiddiris help 



Hide 



^ ^ m 

Back Forward Home PrH Options 



Contents ] Index ] Search ] 



Welcome to the Readiris help 
^ Introducing OCR 
^ Recognizing Documents 
^ How to...? 
^ Reference Information 
^ Software Versions and Options 
^ Product Registration 
t2l Product Support 

T| How to get in touch with I.R.I.S, 
1^ Contacting I.R.I.S, by e-mail 
1^ I.R.I.S, 



How to Get Product Support 

Free technical support is offered to all registered 

customers, ( Registering also entitles you to 
special offers.) 

Europe 

Hotline; 32-10-45 13 64 (working hours) (all major 

languages) 

Fax: 32-10-45 34 43 

USA 

Hotline; 1-561-921-0847 / SOO-477-4744 (working 
hours) 

Fax: 1-561-921-0854 
WWW 

www.irislink, com/support. html (troubleshooting 
info) 

Click here to access the troubleshooting info. 
E-mail 

support@irislink,com . support@irisusa.com 
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Chapter 2 

Guided Tour 

Readiris is a state-of-the-art OCR package equipped with numerous advanced 
features. We will discuss all major features in this chapter and add many tips and 
hints concerning the use of Readiris. 

Starting the Software up 



Click on the Readiris application in the submenu "I.R.I.S. Applications - Readiris", 
or click on the shortcut to the Readiris application on your desktop. 




The Readiris startup screen and application window are displayed. The startup 
screen displays the version and copyrights of the Readiris software. It also gives 
direct access to I.R.I.S.'s home page - simply click on the URL to visit the 
I.R.I. S. web site. Clicking the mouse anywhere else makes this screen disap- 
pear. 



Reodirisp; 



© 2003 All rights reserved 
Image Recognition Integrated Systems 5A 



For more info on new products and 
upgrades^ visit our web site 
www.irislink.cQm 




The next window concerns the OCR wizard; cUck "Cancel" for the time be- 



ing. 



The First-Time Startup 



Depending on the software bundle you acquired, the first startup may be spe- 
cial: you may be prompted to register your licence. 

If this is the case, the use of Readiris is limited to 30 days, and by registering, 
you receive a free softkey from I.R.I.S. to continue using the software after the 
first month. 

It takes your identification number to generate the softkey; be sure that 
this number is available or mentioned when you register your licence. 
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The identification number on this machine is: 



31C01 42535C088038 88032535444808080 H elp 



To enable this software, you need a ke^J. 
Please contact I.R.I.S. to obtain this key 



Enter vour key number: 



OK 



I don't have this key 



Discovering the Readiris Interface 



The Readiris application window not only contains command menus but 
also two button bars that give quick access to all frequent commands. Initially, 
some command menus are dimmed: they concern the preview. As long as no 
image is opened, they are unavailable. 
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The same goes for the image toolbar on the right side of the appUcation 
window: it contains all commands you need during the image preview. The main 
toolbar on the left gives quick access to all frequent general commands. 

To learn which command corresponds to a certain button, hold your mouse 
pointer over it for a while: a tooltip will tell you what the button does. 
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W Readiris 



File Edit Settings View Proce: 




The window pane or image zone is where the scanned images are dis- 
played. You can drop image files onto the image zone (and on the Readiris icon) 
to recognize them. 

As soon as pages gets processed, an additional toolbar, the page toolbar, is 
added on the left side: it represents the various pages of the document and gives 
access to the page commands using the right-click (the "Context" menu). 
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Getting Started with a First Tutorial 



The best way to become familiar with the operation of Readiris is undoubtedly 
by using it. A number of prescanned images is provided with the software; 
they allow you to get started even when there is no scanner connected to your 
computer. Let's turn to these now. 

The "Source" button on the main toolbar determines whether you are going to 
use a scanner or a prescanned image as image source. 

Color, greyscale and black-and-white images are supported on an equal basis. 
Readiris allows you to open Adobe Acrobat PDF documents, JPEG images. Paint- 
brush (PCX) images, DCX fax images (a multipage version of the Paintbrush 
format), PNG images, TIFF images (uncompressed, LZW, PackBits, Group 3 
and Group 4 compressed), multipage TIFF images and Windows bitmaps (BMP). 
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This capability is particularly useful to convert your faxes into editable text 
files. 

As you are going to open a prescanned image, you should select "Image Files", 
and not the scanner, as image source with the "Source" button. 




Next, click the "Open" button. (When you select the disk as image source, the 
"Scan" button is replaced by the "Open" button and the corresponding "Scan" 
command under the "Process" menu is replaced by the "Open" command.) 

Scan 

You could also select the command "Open" fi-om the "File" menu and open a 
prescanned image directly - this works even if your scanner operates as current 
image source. 




You are invited to select an image file. Select the file ENGLISH.JPG in the 
Readiris folder. As this sample file is a color image, it is not only read from disk: 
a "binarized", black-and-white version is created for the OCR process. 



Converting 




Loading D:\Readiris\english.jpg... 





Finally, the image is displayed in the image zone. The page toolbar indicates 
that a single page is loaded into Readiris. 



File Edit Settings View Process Learn Register Help 



L 
L 



Options 
Scanner 



Recognition 




A word about OCR 

The aim of OCR i.^ l:o automatically enter printed text documents in a very effective and 
low cost way. Althouj^h the first research and development on Optical Character 
Recogrition (OCR) twgan more than 30 years ago. this technology \s still unknown by 
most of the people who could use it for their doCLimenl entr^' applicatioriS 

Mow. you can use thif? effective tool in ytjur office and unburden yourself witl\ the 
fastidious task of retyping printed teJ<t, OCK is the most efficient and fasteat tool to enter 
te^<ts mto your computei' auiorivitically- 




The document is read by your ^caimer. This de^'icc iicls as the "eye" of your computer 
and tiendii it the iUfiag'?. At this Step, the document image is only a meamngles*^ doud of 
blaci;^ points, pixels, on a white bacltgroutui. The OCR software has to extract text 
information from these pixdsj it has to recognize shapes by assigning characters. 

n-ie system extensively uses linguistic databases when analyzing the context, in this way 
finding correct solutions for difficult cases. The user trains the software on new 
characteiis and typestyles, which are recognized autom:3tical]y lat&r on. This learning 
module allows you to read virtually any font. In other words: the software gets more 
inielligejfit each time you us^ itt 



A third way of opening prescanned images is the use of "drag and drop": 
drag images from the Windows Explorer onto the Readiris image zone or on the 
Readiris icon and they are promptly opened. 
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start 



You can even open images from within the Windows Explorer: right-click an 
image file and select the command "Recognize" from the "Context" menu. (This 
command only appears when the file's file type is supported.) 



X Name ^ 



excel, ibt 
Par.ytr 



5ize I 1 



m 

BlFF 



m 



Preview 





Edit 




Print 




Resize Pictures 




Open With 




5end To 




Cut 




Copy 




Create Shortcut 




Delete 




Rename 




Properties 



hun.ytr 



1 KB 


IE 


20 KB 


Y 


44 KB 


T 


20 KB 


Y 


936 KB 


Y 


932 KB 


A 


340 KB 


A 


814 KB 


Jl 


20 KB 


Y 


2KB 


Y 


844 KB 


Jl 


1 KB 


Y 


975 KB 


Jl 


597 KB 


Y 


1.264 KB 


Y 


20 KB 


Y 


20 KB 


Y 


20 KB 


Y 


85 KB 


D 



That does not mean the OCR is promptly executed: to give the user fiill flex- 
ibiUty, Readiris is simply started up and the image is opened. 

The image toolbar on the right side of the Readiris application window con- 
tains all commands you need during the image preview: tools to indicate the zones 
of interest, to rotate the image, zoom in and out etc. 

Zooming in on Images 



Readiris has several commands that allow you to zoom in on a scanned im- 
age, for instance to verify the scanning quality. 
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The image toolbar contains buttons that allow you to zoom in at real size, to fit 
the image to the page width and to fit the entire image in the preview window. 
The "View" menu contains the same commands and adds two extra zoom levels: 
you can display the image at 50% and 200% of its actual size. At actual size, a 
screen pixel corresponds to an image pixel. (Shortcuts are available for all zoom 
levels!) 



^ Fit to Window . Ctrl+F 



Fit to Width H Ctrl+W 

50% Actual Size Ctrl+5 

Actual Size Ctrl+1 

200% Actual Size Ctrl+2 



Also notice that the zoom levels are available on the right-click. Click with the 
right mouse button to invoke the "Context" menu and select the appropriate zoom 
level. 



a 

Sill 



File Edit Settings View Process Learn Register Help 





j Window 




Fit to Width 1^ 






50% Actual Size 






Actual Size 






200% Actual Size 



The aim of OCR is tc 
low cost way. Alth 
Recognition (OCR) I 
most of the people w 



Furthermore, you can double-click the right mouse button over a region of the 
scanned image to zoom in at real size immediately. Repeat the operation to zoom 
out again. 

Finally, you can use the magnifying glass to zoom in on details of the scanned 
document. The magnifying glass is also available on the "Context" menu when 
you right-click the mouse over the image. 



1 View 

Window ► 
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Autoform 



The aijn ''autofariualtidg'' is ta 
the original dMumeDt^ 

ThK (X-H pn^cuss di^d iTHMC ih^ ynAi 
rewpiiM ypw Ec!«l> itcnn fariioi ii fei 

WOK Aid inLn« N^Djl^tiCTL Qt ikiCLinaeQf 
TwqgliL^liOfi... 



th& Kcognizjcd ti^4 or not is np h>ihc user- 
You can pcrfonn OCR Ijcaiust you jusi r«d 
1t>c icxt, Lfi T^tich CH» you ^nll -edit Bitd 
fMifiaf ii y[7iii^;lf. Bod ywi can lecjcale tlic 
jzitfri!^ [tecurtTjaiL including LiB rDiQi^ng. 




text . 
any. 

the ufltr 

ih( ibnt ^p«H (1» aHd i^fHst^le aj^ nuinuiavd 
3;cri}^ lh« iW^ilLotn. Th^ judilkaliua of like 
pn^^raptii' IS aim dal^clHl. Ela^rvEr, aa g^pbics 
an npluml and LHe calumnx □rcn'^ rccrabsd - 
the- inTJgfjpTi jiuc CflMniVi- Hch nihef cIC- 



One^ Decomposing a Scanned Image 



Now that the image is scanned, you have to indicate which parts you want to 
convert into editable text by drawing frames, so-called "windows", around the 
zones of interest. 

Actually, Readiris will do this for you automatically when the option "Page 
Analysis" is enabled under the "Options" button on the main toolbar (or under the 
"Settings" menu). 









Opti 


Page Deskewing 
Detect Page Orientation 




5car 







Automatic page decomposition is particularly useful when columnized texts 
and documents with a complex page layout, possibly including graphics and tables, 
are recognized. 
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L 



Options 



L] Recognize 

L English 
I Learn 



rhe aim of OCR is to torrid ticaly-«fili|rpriiited texl docuinerLtB in a very cflioctlv& and 
low CEifit way. Althcmg^h the fifSt r^tatcfi and devsicTpiin^nb tun OpHcsl Character 
tiecognition (OCR) began more than 30 ysdn ago, rhis technology is still unknown 
mast of tho- pt?nplc wha could u-se it for tl^^u j diociJiriigrLt Mitry appEcatiimB. 



Now, you can use this effective tooi m yiur office and imbwden yourself with the 
fastidious task of retj-ping print^^ te^(t. OCR is the most efficient and fastest tool to enter 
tests into your oomptitef automatically, 




rrhc documicnt is readlyy your scannef. This device acts as the "eyt" ftf your computer 
^nd sends it ttie imagie. At Ibis step, the document imag-e is only a meaningless cloud of 
pack ffoints, pixds, on a white Tjackground. The OCR softfrvane has tu Hxtract text 
Lnf^l^ma ^iun from tibig^ pix:e]S: it has tO rtCogJii^e shapes by assigning characters. 



Thii system cjtttmsivuly uses linguistic datafipsca when analyzing the ccmtuxt, in this wsy 
ifinding correct solutions for difficult cases, The user rrains the software on new 
Icharacicrs and typesiyles, which are recogniacd automatically bttir on. This learning 
module allows you to read virtually any ficunltr In other words; the software gels more 
linlelllggnt each tinie you u se iti '' ■ 



Copyright Imasc RjccDgiuliMi Inttgrarttd Sys 
Wc-b Mtc: h![fvjywww.ifislink^:?iii 



Page decomposition uses three window types: text, graphic and table win- 
dows. Readiris discriminates text blocks, tables and graphic zones containing 
photos, illustrations etc. on the page. (Saving graphics and recognizing tables will 
be discussed at great length below.) 

A color code indicates the window type: text zones have a yellow border, 
graphic windows have a blue border and tables a purple border. 
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The number of windows is indicated at all times in the tooltips of the "Text 
Window", "Graphic Window" and "Table Window" tools. 

Draw l:exl: window; 6 

Page analyisis is fast, skew-tolerant and highly accurate: it traces complex, 
"irregular" shapes. 

DOWNSIZED 

mvm\ Sceptre's 
Cf)'U12TLCD panel 
that you can tuck it In 

The page analysis will even detect zones where you get white text on a 
black background. Recognizing such inserts is no problem: while the preview 
displays the scanned document correctly on-screen, Readiris "inverts" the image 
when the need arises to recognize such text blocks! (You can have your scanner 
generate A7/k inverted images to process pages with white text on a black back- 
ground. See below.) 

One and a Half^ Sorting Windows 



Readiris not only detects the various blocks, but also sortsthQm: the zones are 
sorted top-down, left to right by default to cope with columnized documents. 

Evidently, you can modify the sort order. To do so, click the "Sort" button on 
the image toolbar. The mouse cursor becomes a pointing hand as soon as the 
"sort mode" is enabled. 



Draw graphic window; 1 



Draw table window: 0 




is so svelte 



Click on the windows you want to include. Windows you do not click on are 
simply ignored, excluded from recognition. It's easy to see which windows are 
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selected and which aren't: the selected windows have their fiill color, non-se- 
lected windows have a lighter color tone. 
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lA word about QCRI 



iThe aim of OCR is io autoiwaticany-sijter printed text docjnnerLts in a very cflc£±lv& and 
hyw rcwit way. Although the hi^k r-esuarch and devel(TpTTi«nb i.m Optical Character 
Recognition (OCR) began more than 30 years ago, this technology is stiil unknown by 
[ffiQfit oJ thiQ pMpIc ^ho Muld it for thci t] dncumgnt mtry iippliLalLcmH^ 



Now, you can use this effective tool in ydiur office and unburden yourself with the 
fastidious task of retyping printed text. OCR is the most efficient and fastest tool to e 
texts into your computer automaticaliy, 




"aocuinicnl is read by your scanner. TftE^iA'icc act* a$ the your compute^ 
ind sends it the imagu. At this step, the document irnage is only a meamingless cloud of 
iSack points, pixels, on a white background. The CCR software has in extract tsui 
inform ation hhifi thase plxdS: it has to reoogru^g shapes fay assigning characters. ^ 



Thu svstem extensively uses linguistic datatffiiscs when analyzing the ctmtpxt, in this waj^ 
iflndiTig correct solutions for diffiajll cases. The user trains rbe soilware on new 
j:harac"ters and tyjsestylts, which are recognised automatically lattix cm. This leaminai 
mcidule allows yen to read virtually any font- In other words; the software gels more^ 
' .ntelligent each time you use it! J 



:op>Tj£lit Imasc RccDgiudcHi Integrated S 
Web iiic: h([pjywww.irisitnkj?owi 
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Page analysis is enabled by default. To force Readiris to decompose the cur- 
rent page - because you disabled page analysis by accident, because you erased 
some windows erroneously and want to redo the page analysis etc. you can 
simply click the button "Analyze Page" on the image toolbar. 
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Select the document language before executing the page analysis when you 
are dealing with Asian documents. Specific routines are used for these languages: 
the interline spacing of Asian documents is in most cases bigger than in Western 
documents, the text is made up of small icons ("ideograms") that could easily be 
seen as graphic zones in Western documents and the text may run from top to 
bottom, from right to left. And if you forgot to select the proper language, select 
it afterwards. Readiris re-executes the page analysis automatically! 

Some documents have many "stray" dots on the page, may generate a black 
page border around the actual image etc. To erase all small windows - it's as- 
sumed they don't contain any text - and re-sort the remaining zones, you can 
click the command "Delete Small Windows" under the "Edit" menu. 




Two^ Windowing a Scanned Image Manually 

Page analysis is the automatic way of windowing a scanned page. Alterna- 
tively, you can zone an image manually with the windowing tools of Readiris. 




Select window 



Draw text window: 0 



Draw graphic window; 0 



Draw table window: 0 



To draw a rectangle around a zone of interest, select the corresponding tool 
in the image toolbar, click the cursor in the upper left corner of the window, 
stretch the window by moving the mouse to the lower right corner and click 
again. (Sides smaller than 1 mm are not allowed, they wouldn't even contain a 
single character anyway.) 



Not to worry should you have selected the wrong zone type: you can quickly 
change the type by right-clicking the mouse over a window and selecting the 
command "Window - Type" from the "Context" menu. 



Register Help 



A word a 



I Vie 



Magnifying Glass 
Copy as Text 
View 



Delete 



The aim of OCR is to 
low cost way* Although the fi 
Recognition (OCR) began more tnan ^ yeafss ago^ 
most of the people who could use it for their docum 



Id te 

Graphic I j 
Table 



The windows are automatically sorted in the order of creation: arrows indi- 
cate the sort order. 

You can also frame "irregular" text blocks by drawing polygonal windows 
around them. Non-rectangular windows are created by merging rectangular zones: 
as soon as two rectangles (of the same type) intersect, they become a single 
window automatically! In a way, you're building a house by adding one room 
after the other... (Creating polygonal table windows doesn't make any sense.) 
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\ Register Help 


A worci aDout udv 

The aim of OCR is to automatically enter printed te 
low cost way- Although the first research and 
Recognition (OCR) began more than 30 years ago^ 
most of the people who could use it for their docurr 


\t 1 — 
e| ^ 

. i 
i| 


Now, you can use this effective tool in your offi 
fastidious task of retyping printed text. OCR is the i 
texts into your computer automatically. I 





Furthermore, manual windowing can be combined with window sorting: you 
can draw new windows even when the "sort mode" is enabled. You then use 
sorting to include a number of detected windows and manually create some other 
windows where the page analysis didn't yield the appropriate results. As soon as 
you start creating windows in the "sort mode", all zones you didn't select are 
promptly erased! 

To modify, move and delete windows, you need to select them first. To do so, 
select the "Window Selection" or "arrow" tool in the image toolbar and click 
inside a window. Rectangular markers now appear at each comer and in the 
middle of the window sides. 



A word about OCRI 



To unselect windows, click the mouse button elsewhere. To select addi- 
tional windows, hold down the Shift key while clicking on these extra windows. 
To select a window and the included windows (of another type), hold down the 
Ctrl key while clicking on the main window. 

So much for selecting windows. To modify a window, select it, put your 
mouse cursor over a marker and drag the side to change the window size. 

To move a window, simply select it and drag it to another location. 

To delete windows, select them, right-click them and select the command 
"Window - Delete" from the "Context" menu. Doing so deletes all selected win- 
dows as well as the window under your mouse cursor. 



iThe aim of Cll 





Magnifying Glass 
Copy as Text 
ViefAi 



DCS 



Type 



first 



low cost way* Althoug 

Recognition (OCR) began more than 
most of the people who could use it fo 



Alternatively, you can select zones and choose the "Cut" or "Clear" command 
from the "Edit" menu. The "Cut" command cuts the window(s) to an internal 
buffer, "Clear" erases the window(s) irretrievably. When you paste zones, they 
are inserted in their original position, and you have to drag them to their new 
location. 

In fact, ^//familiar commands from the "Edit" menu apply to the windows: you 
can delete, cut, copy and paste them! The "Undo" command also applies: if you 
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have unfortunately deleted, moved, resized etc. some windows, "Undo" will can- 
cel the last operation. 



Edit ^^^^^^^^^^^^H 


Undo . Ctrl+Z 1 


Cut H 


Ctrl+X 


Copy 


Ctrl+C 


Paste 


Ctrl+V 


Clear 


Del 


Delete Small Windows 


Ctrl+M 


Select All 


Ctrl+A 



Also note that shortcuts are available for all commands! Let's give an ex- 
ample: to erase all existing windows, you can choose the command "Select All" or 
its shortcut Ctrl+A and click the command "Clear" or its shortcut Delete. You are 
now ready to recreate the necessary layout. To restore the previous layout, you 
can choose "Undo" or the shortcut Ctrl+Z. 

Three^ Saving Windowing Templates 



The resulting windowing layouts can be saved as zoning templates for 
future use with the command "Save Layout" under the "File" menu and loaded 
into memory with the command "Load Layout". 




If you have to recognize documents with a similar layout, for instance a 50 
page report where the header and footer should be excluded for obvious reasons, 
a single template can be applied to zone all 50 pages. 

When you load a template into memory, page analysis is disabled automati- 
cally. The zoning template remains active until you re-enable page analysis on the 
main toolbar. 



Actually, there's a nice alternative for zoning templates: the preview tool "Ig- 
nore Exterior Zone" limits the page decomposition to the "cropped" portion of the 
image. 




Select this tool and frame the portion of the image you want to process. When 
you're dealing with a multipage document, you can exclude the same outer zone 
from page analysis on every page. (Re-execute the page analysis to cancel the 
image "cropping", or change the zones manually.) 



I File Edit Settings View Process Learn Register Help 




L English 

L a 




EVIS mOE OPEN 

' I MFhaveloltelic vFlri KTib^rlWs^ 
moan to assign people to that category). Imagine if all these 
tno obsequioui months have not been In the service of a 
transcendent talcntJ Vet i am more conscious all the time that 
he has constructed his infallibility around a wilful inability to 
considtii any ideas which mi>;ht require him to re-evaluate his 
own. Dependent on skills which he both envies and resents, 
he is in the usual state of directors who cannot write. His 
failure to dump me after the wniiam Morris office fuck-up 
was, 1 suppose; the g/eatest complimeni he is likely to pay 
me, 

'1 do not know him much better than 1 did at Christmas, 
though at much greater length. Perhaps he is an enigma 
without a seciet, a man who has abandoned motives; there is 
nu sense In trying to divine the psychological make-up of 
someone who is no longer interested in himself. He limits 
self-knowledge to having inilexlble ambitions. 1 have to hope 
that making this film Is still one of them. Can he really 
consider £jts Wide Shut a "poetic" title? Perhaps its chatm Is 
that it Is undoubtedly of his own composition. If it incites 
him to make the movie, so be it. Let's hope that he ha* not 
reached the state which reduced Jack Clayton to total. If 
fastidious, impotence. It is seven years since Full Metal Jacket, 
a title of which he seems undnly proud, l£ only because it Is 
so cryptic (in the same spirit, Nalmkov was rather childishly 
proud of composing utifathotnable anagrams). Full Mer<sl 
Jacket n would not be a bad title for S.K, After all this time, tie 
still wears an impermeable carapace. I do not know whether 
(let alorte how iHuch) he llkei me or my woik. He can make 
very civil e/forts to be amiable. 

'He never explains why he doesn't like a scene, especially 
wVien be has to concede that it is pielty hinny, I have cotne 
to see that he distrusts my j okes - any jokes - probably becans e 
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Readiris Takes You around the World 



Assuming that the windows are correctly defined, you are now almost ready 
to execute the character recognition. We say "almost", because we haven't veri- 
fied the language and document settings yet. 

The language setting can be found on the main toolbar. 

I ^ • I 

I I English ^ 

Click the "Language" button to modify the document language. 



Ldneudge 



|~ Numeric 



English 


T 


Chamorro 




Chinese (Simplified) 




Chinese (Traditional) 




Corsican 




Croatian 




Czech 




Danish 




Dutch 








^^^^^^^^^^^^^ 




Faroese 




Fijan 




Finnish 




French 




Frisian 




Friulian 




Galician 




Ganda 




German 




Greek 




Greek-English 




Greenlandic 




Haitian Creole 





Cancel 



LdneuagA 



r Nu 



[English 




Chinese (Simplified) 

Chinese (Traditional) 

Corsican 
1 Croatian 

Czech 

Danish 
[Dutch 

Estonian 
Faroese 
Fijan 
Finnish 
French 
Frisian 
Friulian 
Galician 
Ganda 
German 
Greek 

Greek-English 
Greenlandic 
Haitian Creole 
Hani 



□ K 



Cancel 



You can press a letter key to move to it directly: if English is currently se- 
lected, and you want to select Occitan, you can click the "O" key on your key- 
board to go directly to the Occitan language. When several languages have the 
same initial, press the letter several times to go through the options. Let's give an 
example: Readiris reads English and Estonian. By pressing "E" once, you select 
English, by pressing "E" a second time, you select Estonian, and by pressing "E" a 
third time, you're back on English. (To go to anotherlQttQY, say T, press Backspace 
before you enter the "T" character.) 

Readiris is far from limited to English: up to 107 languages are supported! All 
American and European languages are supported, including the Central-Euro- 
pean languages, Greek, Turkish, the Cyrilhc ("Russian") and the Baltic languages. 
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Optionally, you can read Asian documents: the extra module "Asian OCR 
add-on" offers recognition of Japanese, Simplified Chinese, Traditional Chinese 
and Korean. (Simplified Chinese is used on China's mainland and in Singapore, 
where Traditional Chinese is used by Hong Kong, Taiwan, Macau and the over- 
seas Chinese communities.) 

Also note that the British and American - or should we say "international"? - 
variants of the English language are distinguished. 

It takes the appropriate Windows configuration to display Central-European, 
Greek, Turkish, Cyrillic and Baltic characters. You may have to install the Win- 
dows multilanguage support before your Windows system is able to cope 
with these languages. 

On a Windows XP, 2000 and Windows NT 4.0 operating system, select the 
icon "Regional Settings (and Languages)" under the "Control Panel". 



w 



Regional and Languagie Options f?]|^ 



Regional Options [ Languages | Advanced | 

Text services and input languages 

To view or change the languages and methods you can use to enter 
tent, click Details. 

[ Details... | 



Supplemental language support 

Most languages are installed by default. To install additional languages, 
select the appropriate check box below. 

0 Install files for complex script and right-to-left languages [including 
Thai) 

0 Install files for East Asian languages 



□ K Cancel Apply 



On a Windows ME and 98 operating system, select the icon "Add/Remove 
Programs" under the "Control Panel" to find out if the module "Multilanguage 
Support" is installed on your PC. 
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Add/Remove Programs Properties 



Install/Uninstall Windows Setup | startup Disk | 

To add or remove a component, click the check box. A shaded box 
means that only part of the component will be installed. To see 
what's included in a component, click Details. 

Components: 



0 @ Microsoft Exchange 

□ ^Microsoft Fax 

0 (J Multilanguage Support 
0 ^Multimedia C 

□ HThe Microsoft Network 



4.2 MB 
0.0 MB 
10.4MB 
1.1 MB 
0.0 MB ^ 

1.2MB 
29.3 MB 



Space required: 
Space available on disk: 
■ Description 

Includes options to change keyboard, sound, display, and 
mouse behavior for people with mobility, hearing and visual 
impairments. 



I 



1 of 1 components selected 



Details.. 



Have Disk.. 





□K 1 


Cancel 







To view and edit Asian documents, you can install an Asian version of the 
Windows operating system or run specialized "emulating" software (such as 
UnionWay AsianSuite or TwinB ridge AsianB ridge) on a Western version of Win- 
dows to correctly represent the ideograms of these Asian languages. Finally, you 
can use Word 2003, Word 2002 or Word 2000 to view and edit such documents: 
Office 2003 System, Office XP and 2000 were specifically designed to cope with 
documents in many different languages. 

Refer to the Readiris "Read Me" file for more information on this subject. 



Selecting the proper document language is imperative. Based on the selection 
of a language, the software knows which symbol set to recognize. Multi-lin- 
guistic support ensures that "exotic" characters such as B, n, y and 0 are 
recognized correctly. 

Secondly, the software extensively uses linguistic databases to validate its 
results. Suppose that you have to read the word "president" where an ink stain 
makes the "r" look like an "f . Looking things up in the English lexicon, Readiris 
will detect autonomously that the word "president" is being read and that it doesn't 
make any sense to recognize the symbol "f . This "self-learning" technique is 
of course highly dependent on the linguistic context. 

Linguistics offer useful help to solve ambiguous cases such as an "O" which 
might be mistaken for a '0'. Another typical example is the letter "1" and number 
'r which have an identical form in many fonts - think of texts produced on old 
typewriters! The linguistic context helps to determine whether you are dealing 
with"r orT. 

The illustration below shows various shapes of '1' and "1". The shapes on the 
first line are unambiguous, the shapes on the second line are ambiguous, but 
linguistics can solve them. When the context does not suffice, the user inter- 
venes. 

193 1950s, ihr 

Wen. Rossellini 

Readiris Changes Languages As Needed 



But the buck doesn't stop here: Readiris can switch languages in the middle of 
a sentence without any help from the user! When Western words pop up in 
Greek, Cyrillic or Asian documents - many untranscrible proper names, brand 
names etc. are written using the familiar Western symbols -, Readiris can switch 
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to the correct alphabet automatically. In other words, you can activate a mixed 
alphabet of Greek, Cyrillic or Asian and Western characters. 

Be sure to select "Greek-English" or the appropriate Cyrillic language setting 
- for instance "Byelorussian-English". In other words: don't try to just select 
"Greek" or "Byelorussian" as document language and hope that the Western sym- 
bols will come out fine! 



Greek-E 



Russian-E 



Here's an example where a Russian text contains some English words 
the image file ALPHABETS.TIF if you want to try it for yourself! 



open 



File Edit Settings View Process Learn Register Help 





npe/iHa3HaMeHHeM CHCxeMbi OnTH^ecKoro 
Pacno3HaBaHH;i SnaKOB ABJiaerca 

aBTOMaXHMeCKHH BB04 neHaxHwx 

^oKyMeHTOB B naM^iTb KOMObioTepa KpaiiHe 



34»$CKTHBHT)TM 

HecMoTpsi Ha 

CHCTCX. (OCR) 

20 .lex iiajiyi, t 
HeH3BecTHa 
aBXOMaxHMecKoro 
AOK3^eHXOB. 



H AemeBMM nyxeM. 
HTO pa3pa6oxKa axoii 
fcuia npe;inpHHffra eme 
xexHOJTorafl eme noKa 

poKOH nySjiHKe jijisi 
BBojisk MaxepHajia h 



The end result looks like this when opened with the wordprocessor - you may 
have to select a Cyrillic font to display the Russian text correctly. 
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g WordPad 




File Edit View Insert Format Help 






Times New Roman ,yt\ 13 mI Western 






7-i'1'i'2-i-3'i'4'i'5-i-6'i'7' 


1 ■ 8 ■ 1 ■ 9 ■ 1 'lO' 1 '11 ■ 1 ■12- 1 ■13' 1 ■14- il 






npeflHasHaneHHeivi CHcreivibi OnrHHecKoro PacnosHaBaHHS Shskob HBJiseTCH 
aBTOiviaTHHecKHH BBOfl neHaTHbix flOKyrvieHTOB b naivHTb KoivinbfOTepa KpaHHS 
3(J)(^eicT^ffl!B^i^eiueEbiM nyreivi. HecMOTpH na to, hto pa3pa5oTKa 3toh 
CHCTeiv4cor;R) c^bina npeflnpHHSTa eine 20 Jier nasafl, sra TexHOJioi™ eme noKa 
HeH3BecTHa mi-ipoKOH ny6jiHKe flJiH aBTOMaTHHecKoro BBOflaMaxepHajia h 

flOKyMSHTOB. 



For Help, press Fl NUM .:= 



To mix other languages, simply select the language with the most extended 
character set. If you have a document where the, say, French translation is placed 
alongside an English text, you have to select French as language to ensure that 
the accentuated characters such as g, e and u get recognized correctly. 

Defining the Document Characteristics 



Now that the language is set, we'll turn to the other document characteristics. 
You can fine-tune the recognition by specifying some document features: the font 
type and character pitch. (These commands do not apply to Asian documents.) 
Let's clarify what this means. 



Let's start with the command "Font Type" under the "Settings" menu. The font 
modes separate "normal" documents from dot matrix printed documents. "Draft" 
or "9 pin" dot matrix symbols are made up of isolated, separate dots, and highly 
specialized recognition routines are used to recognize them. 



"Letter quality" dot matrix printing, also called "25 pin" or "NLQ" dot matrix, 
requires the "normal" setting, as do the printing qualities typeset, typewritten, 
laser printed and Inkjet printed. 

The setting "Automatic" means that Readiris will detect the font mode auto- 
matically. Let Readiris "auto-detect" the font mode in all cases - unless you are 
sure only dot matrix documents are being read! (Obviously, "Automatic" is the 
default value.) 



The font type is indicated in the tooltip of the "Recognize" button: when no 
message is added to the tooltip, the "auto-detection" of the printing quality ap- 
plies, when the message "Dot Matrix" shows up in the tooltip, the dot matrix 
reading mode is enabled. 



The character pitch can be set with the command "Character Pitch" under 
the "Settings" menu. 



ape-descended life 






Fixed I 
Proportional 
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With fixed or "monospaced" fonts, all symbols of the font have the same 
width. An "i" takes up as much horizontal space on a line as a 
"w", as is the case in this sentence. Think of documents produced 
using a typewriter, where the carriage moves a fixed distance for each typed 
symbol. 

A proportional pitch means that the width of a character depends on its shape. 

Symbols like "m" and "w" are wider, take more horizontal space on a line than the 
"thin" characters "1" or "j". Virtually all books, magazines and newspapers are 
printed in proportional pitch. 

The simplest solution is to leave this option at all times on the default value 
"Automatic", which means that Readiris will detect the character pitch automati- 
cally. 



Readiris Gets More Intelligent Each Time! 



When the document language is selected and document characteristics are 
set, enable the interactive learning and click the "Recognize" button. 



I 



Learn 



Recognize 



The OCR progress is indicated on-screen. You can click the "Stop" button to 
abort the text recognition. 



OCR in progress 




Stop 



At the end of the recognition, Readiris enters the interactive learning phase 
when the learning is enabled with the "Learn" button on the main toolbar. 

(Interactive learning does not apply to Asian documents: learning does not 
make sense for these languages which use thousands of different symbols - and 
you'd have to be able to enter the ideograms, not an easy task when using a 
Western keyboard!) 

Font training can substantially enhance the accuracy of the recognition sys- 
tem. When the user tries to read distorted, defaced forms as are found in real 
documents or stylized font shapes which Readiris does not recognize optimally, 
training can overcome this temporary "failure". 

User learning is also used to train the system on special symbols which 
Readiris is unable to recognize, such as mathematical and scientific symbols and 
dingbats. Some examples: Readiris can be trained to recognize the "tt" symbol as 
"pi" or the dingbat "®" as "Tel". (However, the hst of recognized symbols cannot 
be extended with the symbols "7r"and "®"!) 

The recognized text is displayed progressively and the system stops on doubt- 
ful characters, or - if you are dealing with touching characters ("ligatures") - on 
doubtful character strings. They are always presented in their context, the doubt- 



2-35 



User 's Guide 



fill characters are highlighted. Unrecognized characters are represented by a 
tilde (the "~" symbol). 



New Dictionarr: CiUAy DocumentsWeddiris.dus 



Learn 
Don't learn 
Delete 
Undo 
Finish 



nr 



The first thing you should do is verify if you activated the correct font dictio- 
nary and dictionary mode - these are always indicated in the title of the learning 
window. If that is not the case, click the "Abort" button - the document image is 
redisplayed with the zoning as was created enable the right font dictionary or 
dictionary mode and run the OCR again. (The operation of font dictionaries will 
be discussed shortly.) 

If necessary, enter a character (or character string) for the incorrect or un- 
known shape and click one of the following buttons. 

Learn 

You agree with the proposed solution or correct it. The program saves this 
doubtful character in the font dictionary as "sure", final. Future recognition will 
no longer require your intervention, the shape is considered learnt once and for 
all. 



1835 



Abort 
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In the example above, the system stops on a soiled character, and we click 
"Learn" to accept a shape which cannot be confused with other characters. 

Don^t Learn 

You agree with the proposed solution or correct it. The difference with the 
"Learn" button is that the learnt symbol gets the status "unsure" in the dictionary. 
For future recognition, the system will propose the "learnt" solution but still re- 
quire a confirmation. 

This button is used for symbols which might be confused with others: a de- 
faced "e" which might be mistaken for a "c", a damaged "t" which closely re- 
sembles an "r" etc. 



New Dictiondrv: C:UAv DocumentsWeddiris.dus 






Learn 


Don'tlearn j 


Delete 




Undo 




Finish 1 


1 B 


Abort 1 



The "e" above is seriously damaged - in fact it is close to the "e" symbol -, and 
you should click "Don't Learn" so as not to confuse the two symbols. 

Delete 

The displayed form is eliminated from the output. This button is used to ignore 
"noise" on the documents - spots, coffee stains etc. - which might get recognized 
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as points, commas and what have you and to erase any other unwanted sym- 
bol. 

Undo 

You go back to correct mistakes. You can undo the 32 last decisions. 

Finish 

The learning process is aborted but the OCR continues in automatic mode. All 
decisions by the system thereafter are accepted without user validation. 

Click this button when you see that the recognition is highly accurate and does 
not require detailled proofreading. 

Abort 

Don't confiise "Finish" with the "Abort" button: with "Abort", no output is 
generated and you start all over, with "Finish", the text is created, it just isn't 
proofread in detail! 

The Role of Font Dictionaries 



The results of each training session are temporarily held in the computer's 
memory but can and should be stored in files called "dictionaries" for future use. 

(Don't confuse font dictionaries with (user) lexicons! Font dictionaries con- 
tain character shapes learnt during the interactive OCR phase, lexicons are lin- 
guistic databases that assist the recognition.) 

The font dictionaries should be loaded into memory when you want to recog- 
nize similar documents in order to make use of the extra intelligence they contain; 
in this way, Readiris takes into account the intelligence stored in these font librar- 
ies. You could say that Readiris gets more intelligence each time you use it! 



How does this work? The operation of font dictionaries is controlled by the 
"Learn" menu: you have to select a dictionary with the command "Font Dictio- 
nary" and determine its mode of operation. 



Dictionary 



Look in: I Q f^V Documents 



^My Music 
^My Pictures 
Hny Videos 
ElReadiris.dus 



File name: [Readiris 



Files of tvpe: Dictionarv 



New Dictionarv 
C Append Dictionarv 
C Read Dictionarv 



"3 B H' 



□ pen 



Cancel 



Font dictionaries are limited to 500 shapes, and you are recommended to 
create separate dictionaries for specific applications, for instance per type of 
document. Dictionaries have the default extension *.DUS. Training no longer has 
effect when the dictionary is full: the results of the learning are no longer held in 
memory or written to a dictionary. 

You can set the dictionary mode inside the command "Font Dictionary" or 
directly under the "Learn" menu. Three dictionary modes are available: new, 
append and read. 
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y/ New Font Dictionary 
Append Font Dictionary 
Read Font Dictionary 



y/ Interactive Learning 



By selecting "New Font Dictionary", you indicate that the training results will 
be saved in a /2epy dictionary. (If you select an existing dictionary, its contents will 
be erased.) 

The append mode indicates that the training results will be saved in an existing 
dictionary: the recognition makes use of the extra intelligence already contained 
in the dictionary, and you add new font shapes to it. In simple terms, this option 
allows you to build up a font dictionary in several steps. 

(When you enter a filename for a new dictionary and activate the "append" 
mode, an empty font dictionary is created and you complete it.) 

With the last option, "Read Font Dictionary", the dictionary functions in read- 
only mode: you make use of the dictionary without diMmg new font shapes to it. 

Select the new mode when a single page is recognized. To recognize many 
pages of the same type - pages with the same fonts and printing quality - select 
the new mode for the first page, the append mode for a few pages more and the 
read mode for the rest of the document(s). 

Know that the tooltip of the "Learn" button indicates at all times which font 
dictionary is currently active and in which mode that dictionary operates. 




Automatic learning: C:\My Document5\Readiri5.du5 (New Dictionary)] 
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When you enter the interactive learning, the dictionary and its operating mode 
are indicated in the window title; you should click the "Abort" button and start 
over in case they are wrong. 



New Dictionary: CiUAy DocumentsWeadiris.dus 














Learn 
Don't learn 
Delete 






Undo 
Finish 




DarlUsg 






Abort 1 









Sending the Result Directly to Your Applica- 
tion 

The interactive training concludes the character recognition. As Microsoft 
Word operates as output target by default, your wordprocessor is started up au- 
tomatically at the end of the recognition (if necessary) and the recognized text is 
inserted. 

You may get a progress bar on-screen as the recognized document gets for- 
matted. (Whether this progress bar appears on-screen or not depends on the size 
of the document and the complexity of the formatting to be performed.) 
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Formatting text 




Stop 



The scanned image is displayed again with the zoning as created to be avail- 
able for further processing, it stays there until you scan another page. 

You have indeed converted a paper document into an editable computer file, 
be it 40 times faster than manual retyping! Go ahead and compare it with the 
image you have inside your Readiris window. 

Actually, Readiris offers three different methods when it comes to saving the 
OCR result: sending the recognized document directly to a target application, 
saving the result in an external file and copying the result to the Windows clip- 
board. 

The output target is selected using the "Format" button on the main toolbar 
(or the command "Text Format" under the "Settings" menu). 
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Fornriat 



Output 

<^ Send to 

C External file 

r" Open after saving 
|~ Send by e-mail 



I Microsoft Word 37 / 2000 / 2002 / 2003 ^ 
I Rich TeKt Format |''.fl.i i 



Layout 

f Create body text 

C Retain word and paragraph formatting 
Recreate source document 
W Use columns instead of frames 
^ Insert columns breaks 



Options 

1^ Merge lines into paragraphs 

P' Include graphics 

|~ Create bookmarks 

|~ Embed fonts 

I oiT 



IB 



Cancel 



The "Send to" feature offers a direct OCR link between your scanner and 
your Windows applications: you send the scanned documents directly to your 
wordprocessor, spreadsheet or web browser, to Adobe Reader etc.! 
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Output 

Send to 

C External file 

|~ Open after saving 
|~ Send by e-mail 



Microsoft Word 97 / 2000 / 2002 / 2003 



AbiSource AbiWord 

Adobe Acrobat / Reader - Image-Tent 

Adobe Acrobat / Reader - Tent 

Clipboard 

Clipboard Microsoft Ewcel 
Corel WordPerfect 
HTML editor 
Jarte I.k 
Microsoft EkccI 
Microsoft Internet Explorer 



- Layout 

C Create body tent 



C Retain word and paragraph fc Netscape 

□ penOffice.org Writer 1.0 J.I 
(• Recreate source document Software602 Pro PC Suite 

Sun StarOffice 6.0 
P Use columns instead ol Web browser 



W Insert columns breaks 



WordPad 



At the end of the recognition, the target apphcation is started up and the rec- 
ognized document is opened inside a new text file or worksheet. 



Please wait while loading Microsoft 
Word 97 / 2000 / 2002 / 2003 : 
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Don't forget that the option "Send to" also allows you to copy the recognized 
text to the Windows clipboard, so there is no strict need to export the result. . . or 
save it to an external file! 

Saving the Results in a Text File 



You can indeed write the OCR result to an "external" file. Here again, Readiris 
supports a wide range of file formats incorporating all popular wordprocessors, 
spreadsheets, web applications etc. 
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Text Formdt 



Output 

r" Send to 

(• External file 



I Microsoft Word 97 / 2000 / 2002 / 2003 



"3 



P Open after saving 
|~ Send by e-mail 



■ Layout 

C Create body tent 

C Retain word and paragraph fc 

C Recreate source document 

P Use columns instead ol 

W Insert columns breaks 



Microsoft Word 97. 2000. 2002. 2003 (^doc) 



Options 

Merge lines into paragraphs 

W Include graphics 

|~ Create bookmarks 

|~ Embed fonts 



OK 



DC^ r.dca) 
DisplayWrite [^.dw ) 
HTMLr.htm) 
Jarte1.x[^rtf) 

Lotus WordPro (AmiPro] (^rtf) 
Macromedia Dreamweaver MX 6.1 . MX 2004 ( 
Microsoft Excel [''.csv] 
Microsoft Excel (^htm) 
Microsoft Excel tab. (^txt) 
Microsoft FrontPage 2002. 2003 (^htm) 
Microsoft Word 2.x [".doc) 
Microsoft '^^^^j^j:^^^^^^];^ 

Microsoft Word 97. 2000. 2002. 2003 
Microsoft Works 4.5. 5.0. 6.0 [".wps) 
Mozilla Composer 1.4. 1.5 (^htm) 
Mozilla Navigator 1.4, 1.5 (".htm] 
MultiMate ("mm ] 
Netscape Composer 7.1 [".htm) 
Netscape Navigator 7.1 (".htm] 
OpenOffice.org Writer 1.0. 1.1 (".rtf) 
Opera 6. 7 ["htm) 
Rich Text Format (".rtf) 
Software602 Pro PC Suite (".rtf) 
Sun StarOffice 5.x (".rtf) 
SunStarOffice G.O (".rtf) 
Text -MS-DOS format (".txt) 
Text [".txt] 
Unicode [".txt) 
Unicode UTF-8 (".txt] 
WordPerfect 4.2 (".wp) 
WordStar (''.ws ] 

WordStar 2000 [".ws2] 



.htm] 



The option "Open after Saving" is largely similar to the "send" feature: you 
open the recognized document once it's saved. 
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- □ utput 

C Send to I Microsoft Word 97 / 2000 / 2002 / 2003 ^ 

<^ External file | Microsoft Word 97, 2000, 2002. 2003 (".doc) ^ 

f^ Open after saving 
r Send by e-mail 

However, the method used to address the target apphcation is different. This 
time, the Windows file types determine which apphcation wih be started up. 
It's as if you double-chcked the output file in the Windows Explorer... (With the 
option "Send to", Readiris addresses specific target apphcations directly.) 
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Folder Options 



General 11 View File Types Offline Files I 



Registered file types: 



Extensions File Types 
^ DIF Microsoft Excel Data Interchange Format 



I.^J 




New 



Delete 



Details for 'DOC extension 

Opens with: ^ Microsoft Word 



Change... 



Files with extension 'DOC are of type 'Microsoft Word Document'. 
To change settings that affect all 'Microsoft Word Document' files, 
click Advanced. 



Advanced 



OK 



Cancel ~] 



Apply 



The option "Send by E-mail" creates a new mail message and inserts the 
recognized document as mail attachment. Do you know a faster way of distribut- 
ing a paper document quickly...? 
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- □ utput 

C Send to I Microsoft Word 97 / 2000 / 2002 / 2003 ^ 

<^ External file | Microsoft Word 97, 2000, 2002. 2003 (".doc) ^ 

r~ Open after saving 
P' Send by e-mail 



^ Scdnned report with tdble Messdge (Rich Text) 



File Edit View Insert Format Tools Actions Help 



:l5end | ^ ^ 



I _^ I ^Attach as Adobe PDF | &^ I '! ^ I ^ | ^ | 









Cc... 




Subiect: 


Scanned report with table | 



document, doc 
(15 KB) 



Creating Portable Documents... 



We'll go deeper into one format: Adobe Acrobat PDF. Readiris allows you 
to create PDF documents of two types - PDF Text and PDF Image-Text. 
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p Output 

Send to 

External file 
|~ Open after saving 
|~ Send by e-mail 



[Adobe Acrobat / Reader - Image-Text 




- Layout - 



C Create body text 

C Retain word and paragraph fc fje^s^ape 
(* Recreate source document 



Adobe Acrobat / Reader - Text 
Clipboard 
Clipboard Microsoft Excel 
Corel WordPerfect 
HTML editor 
Jarte 1.x 
Microsoft Excel 
Microsoft Internet Explorer 
Microsoft Word 37 / 2000 / 2002 / 2003 



□ penOffice.org Writer 1 .0, 1 .1 
SoftwareG02 Pro PC Suite 

r. , . jSun StarOffice G.O 
Use columns instead of browser 

|~ Insert columns breaks 



Output 

C Send to 

(* External file 

|~ Open after saving 
|~ Send by e-mail 



Layout 

C Create body text 



[Adobe Acrobat / Reader - Image-Text 

|Adobe Acrobat PDF Image-Text (".pdf) 
AbiSourceAbiWord (^rtf■ 



Adobe Acrobat PDF Text (^pdf) 
Adobe GoLive 6XS (".htm) 
Corel WordPerfect 5.x. 6.x, 8.x, 3, 1 0 (".rtf) 
DCAT.dca) 
DisplayWrite (".dw ) 
HTMLr.htm) 



What's the difference between the two? When you select the format "PDF 
Text", Readiris creates a PDF file that contains the text result. (Graphics may 
occur but only when graphic zones occur on the page - photographs, artwork 
etc.) In other words: the page image is /70/' contained in the single-layered PDF 
file! 
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Q Adobe Reader - [autoform.pdf] 




^1 File Edit View Document Tools Window Help 





IS 



Open 



Select Text 



1^ ' QOIQ © 



■ eBooks 



Autoformatting 

TLi3 siLoi o: "autofbamattmg" is to nacr&ate a fcjcsimila cc-py of 



The OZ'?. p;33ai iiB. : 
tool 



lln 'ft* 




fc-EHLLLiT^ kra: ciBBtlrua bo-dy 

piirkqrBph f-3Enattlnf anil. 
crBBtln^ ■ facalaLLv CDpy. 

ukL. All fOiuLcLnq, if 

d^V , t«,i:wA t dj- :r!y 



If v«j lEbib. IIh uid^w^qiA j'flnativ 
LIh (iwTi I'lTX irtJ '^lEilvli iTD nubJibril 

a=mt lln ns^jjikuc Tk jialiniilicii le Ilia 



'dJkLe, j^spidri aod ta'b.ltii ut cicraanK. Hi 











Cell 3A 


S-Ou.DDu 



d^mcJ, b« J; ■ Einipirl md c^JiJua IetI 
.IcciiiKcJi 



J ^l^ 7j03xSj71in 



1 0f1 



The format "PDF Image-Text" yields different results: Readiris creates a 
searchable PDF file that contains the recognized text andthQ page image. The 



2-51 



User 's Guide 



page image is contained above the text in a two-layered PDF file. Use the "Search" 
tool of Adobe Reader and this becomes quickly obvious! 



Search PDF 



Hide 



Finished searching for: 
OCR 

Total instances found: 
3 



Results: 



New Search 



the OCR process does more than 



^ your OCR software reformats the 
perform OCR because you just need 



^ Done 

^ Use Advanced Search Options 

Complete Adobe Reader 6.0 Help 



Click the "Format" button to discover two options that concern the Acrobat 
PDF format: "Create Bookmarks" and "Embed Fonts". 



Options 

R Merge lines into paragraphs 

1^ Include graphics 

1^ Create bookmarks 

P' Embed fonts 

The option "Create Bookmarks" sees to it that a bookmark is created for 
each document element - the graphics as well as the text blocks and tables. For 
the text zones, Readiris applies an intelligent algorithm to come up with a title, a 
"summary" per zone; the tables and graphics are simply numbered. (Another 
navigational element of PDF documents, page thumbnails, can be created dy- 
namically by your Adobe Reader software!) 
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r 

9 Adobe Reader - 


[autoform.pdf] 


^1 File Edit View Document Tools Window Help 


1 Open ^ Save a Copy Print Email See 




□ © 46% . © li m 



Options ^ X 



"^titles 

Autoformatting 
'-[^The aim 
El-[^ images 

'■■[^ imagel 
&[^tables 
l-[^table1 



Au 

Thf 4iiin yf 

The SViic pni 
yau UhIi! 

In iwjy.w 
f-jn? uid iTwrc pie 



dfrracin-alikhca 

yyilfff Ji/silipcn.in 




B Adobe Reader - [autoform.pdf] 



^ File Edit View Document Tools Window Help 



f Open 



a Copy Print @] Email | 



^ ^ D O □ © ^6% 



Options ^ X 



AuMormattine 


















Ai 

The 4iim <1 

The on 



V,>.£u.p:[4bct 
!jVwM.in™hk 
fiTTniMii jiwrw 
^,|[Wf,'"irP=r 




The option "Embed Fonts" embeds the fonts in the PDF files. Embedding 
fonts prevents font substitution when readers view and print the recognized docu- 
ment. It ensures that readers - whatever their computer configuration may be - 
see the text in its original fonts. However, embedding fonts increases the file size 
of the recognized documents (somewhat)! 



... Or Reading Them 



Let's look the other way for a moment. As Readiris offers fiill support of the 
Adobe Acrobat PDF format, you won't just generate PDF files, you can also 
readt\iQm\ 



"Repurposing" PDF documents may be a major application of Readiris. 
There are several reason why this is the case. First of all, it's a way of converting 
images into text: open image-based PDF documents, execute the recognition and 
save the OCR result to a text document (in any supported text format). Text files 
are editable, image files are not. 

Second case: you can convert image-based PDF files to text-based PDF docu- 
ments. You then execute the recognition on "image-only" PDF files and save the 
OCR results... as text-based PDF documents! Text-based PDF files are search- 
able and editable, "image-only" PDF files are not. 

Finally, converting PDF files is a way of "unlocking" PDF content. You can 
recognize "read-only" PDF documents, where the text is normally inaccessible. 
With unprotected PDF files, the content can be retrieved (copied and saved to a 
text file), with "read-only" files, the content cannot be extracted. These docu- 
ments can only be viewed and printed! 

An important nuance: Readiris does not open password-protected PDF docu- 
ments, even if all other PDF security barriers are broken down by Readiris! 

Proceed as usual: load PDF files into memory as you open prescanned images 
- faxes, snapshots made with your digital camera etc. Click the "Stop" button or 
press Escape to interrupt the loading process between two pages. 

There's a specific option that concerns PDF files. You can open them as color 
and as black-and-white documents. This option is offered because rasterizing 
color documents is much slower! 
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Look in: | 
^sample 



f Readiris 



"3 B 



File name: | 

Files of tvpe: |PDFr.pdf) 

j~ Digital camera 

V Process as 300 dpi 

n Smoothen color images 

1^ Load PDF documents in color 



□ pen 



Cancel 



Recognizing Multiple Pages 



After the OCR, the scanned image is redisplayed with the zoning as created 
to be available ^or fiirther processing. 

You can now open the recognized text with your wordprocessor or text editor, 
import it into your desktop publishing software or any other text-based applica- 
tion. Go ahead and compare it with the image you have inside your Readiris 
window. 

But how do you save the text of additional pages? Or in other words: how do 
you process documents consisting of multiple pages? It's actually very simple: go 
on recognizing pages and save the results to the same file! (Make sure that file 
isn't currently open, because that will prevent you from writing to it!) Secondly, 
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don't forget to put the font dictionary in the append mode so that you can con- 
tinue the font training comfortably. 

As soon as you scan pages (or open image files) inside a document, you have 
to decide whether you want to start a new document or complete the current 
document. 



Reddiris 



Are you ready to delete the current document? 
Yes I No Cancel 



Answer "no" to add pages to the current document, answer "yes" to create a 
new document. This answer has the same effect as the command "New Docu- 
ment" under the "File" menu. 




But there's a more efficient way of recognizing several pages than scanning 
and OCRing them one after the other: processing multipage documents di- 
rectly! 

To scan a document composed of several pages in one operation, enable the 
document feeder of your scanner with the option "ADF" under the "Scanner" 
button. 

|~ Landscape 
P ADF 
|~ Invert 
|~ Digital camera 
|~ Process as 300 dpi 
l~ Smoothen color images 




Place the pages of your document in the automatic document feeder and start 
the scanning: all pages are scanned until the document feeder is empty. 
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You can also o/7e/7 multiple prescanned images. To load several images, select 
the first image and hold down the Ctrl key as you select additional images. To 
load a continuous range of images, select the first image and hold down the Shift 
key as you select the last image. 



Open 



nput 






m 




Look in: | & Readiris 


_lJ ^ E rt- 




1 1 alphabet I 


I^deskew 


|S| greek 


*]multipag 




^asian 


(tj digital 


|S| italian 


S| norweg 




1 lautoForml 


S dutch 


|S] japanese 


S| polish 




gl brazil 
^ Catalan 
l£j Czech 


IS French 
KSqerman 


|S]korean 
@lite 
|?| matrix 


S|russian 

simp-chinese 
S] Spanish 




< ! 


nil 




1 


a' 



File name: j"english.jpg" "alphabet.tif" "autoform.jpg" 



□ pen 



Files of type: |AII image files 

r~ Digital camera 

|~ Process as 300 dpi 

|~ Smoothen color images 

|~ Load PDF documents in color 



~3 



Cancel 



The same effect can be obtained comfortably from within the Windows Ex- 
plorer: select several image files, right-click and select the command "Recognize" 
from the "Context" menu. You can repeat this operation: all images you send to 
Readiris append the current document until you click the command "New Docu- 
ment". 
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Hlfin 

llPml 
IS Pre 

gjPrn 
g|fru 

i 
m 

@ 

1. 



Edit 
Print 

Resize Pictures 



Open With 




Send To 




Cut 




Copy 




Create 5liortcut 




Delete 




Rename 




Properties 



You can even drag several prescanned images from the Windows Explorer 
onto the Readiris window! The same argument holds: all images you drag onto 
the Readiris window are added to the current document until you click the com- 
mand "New Document". 

Readiris sorts the images automatically - image OOl.tif precedes 002.tif pre- 
cedes 003.tif etc. 

The page toolbar on the left side - it is displayed as soon as pages get 
processed - represents the various pages of the document and gives access to 
the page commands (using the right-click). 
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[^^Rfiddiris 




The current page is highlighted in the page toolbar and mentioned in the Readiris 
title bar. 

The page toolbar comes with a tooltip: hold your mouse pointer over a page 
thumbnail to learn which image was loaded into the memory. (If a multipage 
image was opened, there's obviously just one file for all the images.) When you 
are scanningmultipagQ documents, the tooltip simply mentions the scanner model. 





You can quickly print the scanned images with the command "Print Images" 
under the "File" should you need an overview of your document. 
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You can print the current page or all pages. Select the number of pages or 
thumbnails you want printed on a page. 



Print Images 




C Current page 
C* All pages 
Images per page: 






|G4 ^ [- 








OK Cancel j 







But you don't have to print all pages either: the page toolbar (and the "Edit" 
menu) allow you to exclude pages (temporarily). Right-click a page thumbnail 
and select the command "Exclude Page" on the "Context" menu or display a page 
and select the command "Exclude Page" from the "Edit" menu to exclude it from 
the printing (and recognition) process. Select the command "Include Page" to 
include it again. For greater flexibility, the "Edit" menu offers equivalent com- 
mands that apply to ^77 pages. 
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Select Page 1 K 
Delete Page 1 
Move Page 1 Up 
Move Page 1 Down 







^^^^^^^^^^^^^ 








Include All Pages 




Exclude All Pages 



The thumbnails of excluded pages are stricken out. Mind you, printing the 
current page always works, even if it is "disabled" for the time being! 




Load the sample image MULTIPAGE.TIF and start the recognition. The vari- 
ous pages are displayed one after the other; the Readiris title bar indicates the 
page number. 



' Readiris - C: Weadiris\niumpage.tif (page 3 of 5) 



I File Edit Settings View Process Learn Register Help 



i OCR Wizard 



1 Scanning 



L 
L 



Options 
Scanner 



I English 

L 



□ 




□ 

4 

□ 



ivas committed. Nor shaH a heavier penalty be imposed than the ooe that was applicable 
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If the interactive learning is enabled, you go through the recognition and learn- 
ing phases page by page. The dictionary mode "New" is used for the first page 
and the mode "Append" for the successive pages. 

When you click the "Finish" button, all decisions by the system thereafter are 
accepted without user validation. In other words, the interactive learning is aborted 
for ^//pages; the OCR for this document continues in automatic mode. 

The recognition result of multipage documents is saved in a single output file. 
(When the recognition result is sent to a target application, multiple pages get 
created inside a single document.) 

At least, that's the case when the option "Create One Document per Page" is 
disabled when you save the recognized document. This option sees to it that each 
page of a multipage document is saved in a separate file. If the user gives the file 
name text. doc, the files will be called text-l.doc, text-2.doc etc. 
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File name: [multipage 

Saveastvpe: |Rich Text Format 

r~ Open after saving 

n Send by e-mail 

R Create one file per page 
(multipage-1, multipage-2...) 



Editing multipage documents 



The user can edit multipage documents, mainly to correct scanning errors: he 
can delete pages from the document and move pages to other locations in the 
document. 

The navigation first. To go to a page, click on its icon in the page toolbar or 
hold your cursor over its thumbnail, invoke the "Context" menu by right-clicking 
and use the command "Select Page". To go to the previous page, you can use the 
shortcut PageUp, to go to the next page, press PageDn. Press Home to go to the 
first page, press End to go to the last page. Or use the corresponding commands 
under the "View" menu. 





First Page 


t Home 



Previous Page ^ PageUp 
Next Page PageDown 
Last Page End 



Let's edit the document now. To delete a page from the document, hold your 
cursor over its thumbnail, right-click it and use the command "Delete Page". And 
we remind that you can temporarily exclude pages, not delete them, fi*om the 
recognition (and image printing) process: the page toolbar (and the "Edit" menu) 
offer the necessary commands. 



Save 



t| Cancel 







ll 






Exclude Page 1 
Select Page 1 






^^^^^^^^^^^^^ 








Move Page 1 Down 





To move a page up in the document, use the command "Move Page Up", and 
to move a page down^ use the command "Move Page Down". 

To move a pageto a totally different location in the document, drag its icon to 
that new location. 



Readiris 


File Edit Settings View Process Learn 


Register Help 








national or social origin, p 








Furthermore, no dislincLio 
inLemalional sUilus of the i 
incie pendent, trust, non-se] 

Everyone has the right lo I 

1 No one shall be held io sk 
their forms. 

No one shall be subjected 


Scan 




L 
[ 
[ 


Source 

Options 1 

— ^tf 

Scanner | 





Starting a New Document 



You can use the command "New Document" under the "File" menu to close 
the current document. 
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This command "cleans the slate". Any document loaded into memory - con- 
taining a single page or multiple pages - is erased. You are now ready to create a 
new document. 

But you can also create a new document from within the current document. 
As long as the OCR was not executed, the system assumes that you want to add 
pages to the current document. You can for instance scan all the pages in the 
scanner's autofeeder, fill the feeder again and start over. All pages scanned will 
compose a single document. Or you could scan a number of pages and add some 
image files, say, faxes. These pages again form a single document, all you have to 
do is change the image source in between with the "Source" button. 

When the OCR ff^^ already executed and you re-initiate the scanning (or the 
loading of images), you are prompted to start a new document or complete the 
current document. 



Readiris 


s 




Are you ready to delete the current document? 




Yes 


No 1 Cancel | 









Recognizing Text Zones 



We now know how to recognize pages and how to process multipage docu- 
ments. But can we recognize less than a page with equal comfort? We can! 
Right-click your mouse and select the command "Copy as Text" from the "Con- 
text" menu: the text window under the mouse gets recognized and sent to the 
clipboard. 
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The various levels of 
formatting are: creating body 
text, retaining the word and 

ere at inc NagniFying Glass ^^^y ^ 



Great ini view beans no 

format t window ►^^^d: you 

get a connnuous , funning 
text. All formatting, if 
any, is done afterwards by 
the user. 

If you retain the word and paragraph formatting, 
the font type, size and typestyle are maintained 
across the recognition, The justification of the 
paragraphs is also- detected. However, no grapliics 
are captured and the columns aren*t recreated - 
the paragraph Ju£t follow each other etc. 

"Autoformatiing*' recreates a facsimile 
copy of the original dtxjument^ the text 
blocks^ graphics and tables are recreated in 
the same place and the word and 
paragraph fomiatting are maintainod 
across the recognition. 



^1 



i 



.^1 



The current system settings - language, font type etc. - apply. The OCR result 
is placed on the clipboard as "running", unformatted text. 

Organizing the Text Output 



Saving or exporting the text means more than selecting an output method or 
defining a filename for the output file. You also select a file format and determine 
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the appearance of the recognized text. In short, you have to decide where you 
want to take the text before you launch the execution. 

Some options of the "Format" button allow you to influence the look of the text 
output. 

The text flow of the output document is directly influenced by the option 
"Merge Lines into Paragraphs". 

- □ ptions 

W Merge lines into paragraphs 

W Include graphics 

Keep this option enabled to have Readiris detect the paragraphs: Readiris will 
then apply the normal wordwrap typical of wordprocessors, otherwise, a car- 
riage return is added after each line and hyphenated words remain so! Paragraph 
detection is enabled by default. 

Let's give an example to clear things up. When the first three lines of a col- 
umn are "The new presi-", "dent waved from the balcony." and "His wife had 
joined him.", the paragraph detection gives you the following result: "The new 
president waved from the balcony. His wife had joined him." The hyphenated 
parts of the word "president" were "reglued" and a space was added at the end 
of the first sentence, thus creating naturally flowing text. 

Had paragraph detection not been enabled, the original layout would have 
been retained, with a carriage return added at the end of each line. 

This option is /70/' available when the PDF format is selected: Adobe Acrobat 
PDF files always store text line by line! 

(The "Format" button contains some formatting options we haven't discussed 
yet - this will be done shortly.) 

Setting up Your Scanner 



Let's set our scanner up now. It is assumed that the scanner hardware and 
necessary drivers are installed correctly. 
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If your Readiris software licence was bundled with a scanner or digital cam- 
era, this step probably is unnecessary as your hardware may already be set up 
under Readiris. 

Click the "Scanner" button on the main toolbar. 

II Scanner || 

Click the button "Scanner Model" to determine your scanner model. 



Scanner 



- Type 

HPScarvJetSSOOC 



Scanner Model... 
Config... 



Contrast 0 

-J— 

Brightness 0 

J ■ 

darken lighten 
1^ Dptimize resolution for OCR 



When you select the option "<Image>" as "scanner", prescanned images fiinc- 
tion as image source at all times - you won't have even to select the disk as image 
source with the "Source" button on the main toolbar. 

The "Configure" button is only available when you scanner allows it. It gives 
access to some advanced scanning parameters; with Twain scanners, clicking 
the "Configure" button allows you to select the Twain source. (You can also use 
the command "Select Source" under the "File" menu.) 




Format: 
fA4 



DK 



Cancel 



Resolution: 
[300 [3 

C Black-and-white 
C Grayscale 
Color 



|~ Landscape 

l~ AutoExposure 

r~ Invert 

r~ Digital camera 

|~ Process as 300 dpi 

V Smoothen color images 
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Select Source 




Sources: 






Hewlett-Packard ScanJet 5500C 
IBCR II 2.0 










Select 




iiiii 












Cancel 













Once the scanner is selected, the same window may allow you to set the 
scanning resolution, the page format and orientation, brightness and contrast and 
may allow you to indicate whether you are going to use the scanner's document 
feeder. With Twain compliant scanners, all scanning parameters are often set 
inside the Twain interface. 

Set the brightness, and, if available, the contrast. 

By enabling the option "Landscape", you indicate that the selected page orien- 
tation is wide ("landscape") instead of tall ("portrait"). The page orientation actu- 
ally applies to reduced page formats: on an A4 flatbed scanner, you can scan, say, 
A5 pages (half that big) in portrait or landscape format, but you can obviously 
only scan the full A4 surface in one direction! 



L 



The option "Invert" allows you to generate "inverted" images in the black- 
and-white scanning mode - you can activate this option to process full pages with 
white text on a black background. 



Bring Color to Your Text Scans! 



Readiris supports black-and-white, greyscale and color images on an equal 
basis, so you are free to choose the color mode that best suits your needs. To 
include lineart graphics in the recognized documents, scan in black-and-white, to 
include black-and-white photos, scan in greyscales, to include color pictures, scan 
in color. 

But why would you reduce the bit depth of the images during the scan? It goes 
without saying that greyscale and color images are slower to acquire and require 
more RAM memory than "bilevel" images. 

Scanning in greyscale and color isn't just useful to save the graphics with 
sufficient quality, in some instances, it's also useful or necessary to obtain good 
OCR results! When text is printed on a color background, scanning in color may 
create the tone differences that are lacking in black-and-white images. When 
there is only limited contrast between the text and the background, the back- 
ground can create "noise" that renders the recognition difficult or impossible! 

Think for instance of black text printed on a dark background: when scanning 
such a document in black-and-white, you may not be able to "drop" the back- 
ground color without losing the text information as well, as much as you may try 
to adjust the scanner brightness... 
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Readiris creates a black-and-white version for every greyscale and color im- 
age. Thanks to its intelligent routines, even tough cases get solved - here's how a 
"difficult" image gets binarized! 

MASAYOSHI SON, 42. president and CEO, 
is the master Net empire builder His con- 
glomerate holds stakes in 300 Internet 
companies in the U.S., Japan, Europe, and 
other Asian countries. Today, Softbank 
manages about $4 billion in venture capital 
funds for global investments. 

YASUMITSU SHIGETA, 35, has invested in 
more than 70 Web or mobile Net-based ven- 
tures in Japan and the U.S., including Tum- 
bleweed Communications and Phone.com. 
Shigeta is also developing new businesses 
that take advantage of the growth of the 
Internet and mobile communications. 



To view a scanned image in black-and-white, disable the option "Display Docu- 
ment in Color" under the "View" menu. 



^ Display Document in Color Ctrl+O 



w 



When this option is enabled, you won't see any black-and-white images on 
your computer screen - even when you're actually scanning bilevel images! That's 
because the option "High-Quality Display" under the "View" menu optimizes the 
images for an optimal on-screen legibility. 



This specialized high-resolution display technique converts black-and-white 
images into greyscale images. 



Reading dot matrix documents 

You can read dot matrix document without changing the font mode. The software detects 
whether 'nonrial" text or dot matrix printouts anc being read. 

Far out: in the uncharted backwaters o-f the unfashionable end q-F 
the Western Spiral arm o-F the Balatiy lies a small unregarded 
yellow sun- Orbiting this iit a distance o-f roughly ninety-two 
iTiillion miles is an utterly insignificant little blue green 



Reading dot matrix documents 

You can read dot matrix document without changing the font mode. The software detects 
whether normal" text or dot matrix printouts are being read 

Far out in the uncharted backwaters of the unfashionable end of 
the Western Spiral arm of the Gal sky lies a small unregarded 
yellow sun^ Orbiting this at a distance o-f roughly ninety-two 
(Tiillion miles is an utterly insignificant little blue green 



Greyscale and color images are softened, smoothened. 



I A word about OCR 

The aim of OCk is to LiiUonialically enEer prinled text documents in a very effective md 
.: . k>w coat way. Although the furst resuiuch and dcvclopmLnt on Optical Character 

IKecogniLion (OCR) bpgan more than 3(1 years ago, this technology is still unknown by 
mofit of the people who could use it for their dociunjcnt entry applicatinns. 
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I A word about OCR 
The aim of OCR is to tiutomatically ^.'ntLT printed text documents in a very effective and 
low cost way. Although the first research and development on Optical Character 
Kecognition (OCR) began more than 30 years ago, this technology is still unknown by 
most of the people who could use it for their document entry applications. 

As a result, there's no need to zoom in, even on laptops with an LCD screen 
or desktop computers with a low-end 14" screen. High-quality display is enabled 
by default, but may be superfluous on high-resolution computer screens. 

Different Devices^ Different Resolution 



Whatever your scanning mode may be, use a scanning resolution of 300 dpi 
for normal applications. Use a higher resolution of 400 dpi for small print (below 
10 point) and when the document is very degraded. 

Readiris reads point sizes of 6 to 72 point (0.08" to 1 or 0.21 to 2.54 cm). 

6 point 

72 point 

Readiris also recognizes "drop letters", large caps that cover several lines. 
(These can of course be no bigger than 72 point!) 

Readiris reads drop 
letters (also called 
"drop" caps) that 
cover several lines and 
assigns them to their starting 
line. 
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As optimal OCR requires a resolution between 300 and 400 dpi, Readiris 
warns you when you're submitting images with a resolution lower than 200 dpi or 
higher than 800 dpi. However, Readiris can correct scans with too much detail 
for you! Enable the option "Optimize Resolution for OCR" in the scan settings to 
do so. Whenever the image resolution of your scans exceeds 600 dpi, the resolu- 
tion is reduced for the OCR process. 

I P Dptimize resolution for OCR 



There are other ways of avoiding this warning: you may be reading faxes - 
which have a resolution of 100 or 200 dpi -, when you're creating images with a 
digital camera - where the resolution is unknown - and when you're opening 
images where the file header contains an incorrect resolution. To process such 
images hassle-free, enable the option "Process as 300 dpi". This setting applies to 
both direct scanning and the opening of prescanned images. 

|~ Invert ^ Digital camera 

|~ Digital camera P Process as 300 dpi 

P Process as 300 dpi F Smoother! color images 

r Srmoothen color irmages p l^^j ppp documents in color 



When your images are acquired by a digital camera instead of a scanner, it 
is mandatory that you enable a special option (that also applies to scans and 
prescanned images). 

p If^^gfj P Digital carmera 

P Digital camera 1^ Process as 300 dpi 

P Process as 300 dpi P Smoothen color images 

r Smoothen color images p Load PDF documents in color 



By doing this, you enhance the image before it gets recognized. There are 
specific challenges to be met when it comes to digital cameras: they produce 
low-resolution images - even when you hold the camera very close over your 
document - and the image resolution is in any case unkown. 
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There are some "finer points" to be aware of when it comes to successfully 
recognizing images captured with a digital camera. 

First of all, select the highest possible image resolution. Create for instance 
2,048 X 1,536 size images when 1,024 x 768 and 640 x 480 images are also 
supported. Secondly, enable the "macro" mode of your camera to take closeups 
- which is always the case when you photograph documents. (This mode was 
designed to capture flowers, insects etc.) Otherwise, the images are unsharp and 
illegible. 



PPolice seek ^ ^ 

I /^yi^erfraud and electronic 
I Wextortron is becoming a flH^ 
I real problem for UK M 
^^usinesses, and the ldW||||^| ^H|d^ 



Limit yourself to no or small compression: important compression reduces the 
sharpness of the captured text. Zoom manually to crop your document - some 
cameras are bundled with photo stitching software, but don't bother using it for 
document capture. 

Hold the camera directly above the document to avoid capturing the docu- 
ment at an angle. However, avoid shadows cast on the document by the camera 
or your hand! Produce stable images. Consider mounting your camera on a tripod 
when necessary. 

Disable the flash when you're filming glossy paper, otherwise the image may 
be too light. Generally speaking, adapt the brightness and contrast to the environ- 
ment - day light, lamp light, neon light etc. (Some cameras can be calibrated by 
filming a white document.) 



To give it a try, open the image DIGITAL. JPG in the Readiris folder and 
execute the recognition. 
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Saving Default Settings 



Set all scanning parameters correctly and click the command "Save Default 
Settings" under the "File" menu to save the current settings as default settings for 
future use. 



Save Default Settings 



H 

Settings files contain more than the scanner settings: they also determine 
whether you are going to use interactive learning, which language the documents 
have, which output mode is used - for instance send text to WordPad - etc. In 
short, ^77 operational settings of Readiris are stored in the settings files. 

Saving Specific Settings 



The default settings will obviously be used at each program startup, but you 
can save specific settings as well to avoid having to redefine the operational 
parameters. The commands "Save Settings" and "Load Settings" under the "File" 
menu take care of this. 




Save Default Settings 

Let's give an example: if you regularly have to OCR English documents with 
a specific layout, you are recommended to create a settings file for this type of 
document. You would then select "English" as the document language, load a 
specific zoning template to avoid having to reapply the same windowing each 
time, disable learning but activate a font dictionary in the "read" mode because 
the same typefaces are used systematically etc. 

If you are unsure what the current settings are, you don't have to "plunge" 
into every menu and command to discover what they are. You can use the com- 
mand "Info" from the "File" menu to get an overview. 
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Scanner 
Model 
Resolution 
Format 



Mode 
Landscape 



HP ScanJet 5500C 

300 dpi 

A4 



Black-and-white 
Off 



- Document - 
Font Type 

-Page 

Resolution 



r 



Tent- 



□ K 



Format M icrosof t Word 37 / 2000 / 2002 / 2003 
Paragraph On 

Layout Recreate source document 



Automatic 



Language English 



Scanning Documents 



Now that our scanner is set up, we want to get started scanning documents. 
There are some elements you should be aware of. 

First of all, pay some attention to lineskew. Although the page analysis and 
recognition are skew-tolerant, it may become difficult to window and OCR a 
page correctly when the skew is too significant. Limited lineskew (less than 0.5°) 
can be ignored because the OCR accuracy does not suffer. 

The option "Page Deskewing" under the "Options" button (and under the "Set- 
tings" menu) determines whether pages which were scanned at an angle will be 
deskewed, straightened automatically - limited lineskew gets ignored. This op- 
tion is disabled by default. 
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If you forgot to enable this option, use the "Deskew Page" button on the image 
toolbar (or the command "Deskew Page" under the "Process" menu) to 
"straighten" pages which were scanned at an angle. 



©I 



The deskewing takes a few seconds: the image is analyzed to detect the skew 
angle - if any the color or greyscale image and its black-and-white version are 
deskewed and the page analysis gets re-executed. 





You may also need to adjust the page orientation. Use the rotation tools on 
the image toolbar. (Corresponding commands are found under the "View" menu.) 
Three rotation directions are available: to the left, to the right and upside down. 
Rotation also takes a few seconds as the image itself is updated, not just the 
display on-screen. 




Rotate Right 
Turn Upside Down 



However, Readiris can correct badly oriented pages for you. Enable the op- 
tion "Detect Page Orientation" under the "Options" button (or under the "Set- 
tings" menu) and Readiris will correct the page orientation where needed. 
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Options 



You can make good use of the image DESKEW. JPG in the Readiris folder if 
you want to try it. Enable the options "Page Deskewing" and "Detect Page Ori- 
entation" before you open the image and let Readiris restore the Tower of Pisa 
the way we like it. 
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Adjusting the Scanned Images 



As was already indicated, powerful intelligent routines automatically convert 
color and greyscale images into black-and-white. Should this still be necessary, 
the user can optimize the image further for the consecutive OCR process. Select 
the command "Adjust Image" under the "Process" menu to do so. 




When you access this command, the black-and-white version is displayed 
automatically. (It's as if you disabled the option "Display Document in Color"!) 
There are some complicated concepts here, and we need to discuss them in 
detail. 



Adjust Imdgie 



\^ Smoothen color image 

With some scanner models, reduction of the 
sharpness is needed to recognize color and 
greyscale images. 

Smoothening allows to separate the tent from 
the colored background 



-Brightness 

(* Automatic 
C Manual 

lighten 
-Despeckle: off— 

J 

0 



127 



darken 



20 



□ K 



Cancel 



Apply 



Help 



The option "Smoothen Color Image" renders greyscale and color images more 
homogeneous by "flattening", smoothing out relative differences in intensity. As a 
result, a stronger contrast is created between the foreground - the text - and the 
background - a color, artwork etc. 

This preprocessing feature may seem highly technical and difficult to un- 
derstand, but it certainly has its role to play: with some scanner models, this 
reduction of the sharpness is needed to recognize color and greyscale images. 
Smoothening is sometimes the only way separate text from the colored back- 
ground! Below is a sample image that is simply illegible without image smoothing. 
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■ JN QUEST OF CAL\TSO 

H from only £ ! ,6 50* 

r 16 ni^ts. "hh. - 25th Ort 2000. 

■. ■ ' -it^^^it?- M^Stii t>^de|l^ '.-l 



IN QUEST OF CALYPSO 
from only £1^650,* 
16 nights. 9th - 25di Oct 2000. 



The image smoothening can also be enabled when you load prescanned im- 
ages into memory! 

Files of tvpe: |AII image files 

n Digital camera 

l~ Process as 300 dpi 

1^ Smoothen color images 

|~ Load PDF documents in color 



The brightness now. By "brightness", we actually mean the black-and-white 
threshold. The setting "Automatic" determines the bilevel threshold automatically. 
Apply a different threshold when necessary by darkening or lightening the black- 
and-white image: when you darken the image, more pixels become black in the 
black-and-white version, when you lighten the image, less pixels become black in 
the black-and-white version. 

Note above all that no image adjustment is executed until you click the "Ap- 
ply" button! By clicking "OK", you execute the adjustment andQ\os>Q the window. 
Here's an example where we lightened the black-and-white image dramatically - 
though admittedly not with OCR accuracy in mind! 
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W Smoother! color image 

With some scanner models, reduction of the 
sharpness is needed to recognize color and 
gre^Jscale images. 

Smoothening allows to separate the text from 
the colored background 



■Brightness 

C Automatic 
^ Manual 



lighten 
■Despeckle: off- 



67 



darken 



20 



OK 



Cancel 



Apply 



Help 



The first two options concern color and greyscale images, the last one, 
"Despeckle", exclusively concerns black-and-white images. "Despeckling" means 
that the "parasite pixels" (also called "salt and pepper noise") will be removed 
from black-and-white images. 

If computers can't If computers can't 

adapt easily, then adapt easily, then 

rnaybe the people maybe the people 

using them can. using them can. 
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Be sure that you don't erase spots that are too big, otherwise you might start 
erasing the dots on "i", portions of dot matrix letters etc. ! 

- Despeckle: remove 1 0 pixel dots 

J 

0 20 
Removing too large dots may erase useful 
information from the image 



The best way of optimizing the images for the OCR process is this: place the 
adjustment window where it doesn't prevent you from judging the image adjust- 
ment you execute. Adapt the parameters - clicking "Apply" each time - until the 
image is crisp and clear. 

Letting the OCR Wizard Work for You 



Let's get started capturing documents now. Instead of going through all the 
parameters, we'll use the OCR wizard, a very comfortable way of recognizing 
pages. 

Click the "OCR Wizard" button on the main toolbar (or select the command 
"OCR Wizard" under the "Process" menu). 




The wizard guides you through the OCR process comfortably: answer a few 
simple questions and you'll obtain quick and easy results with Readiris. 
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< Back 


Newt > 




Cancel 



Actually, the OCR wizard starts running each time you start up Readiris; you 
can avoid this by disabling the option "Enable Wizard on Startup" in the first 
screen of the wizard (and with the equivalent option under the "Settings" menu). 

Readiris Recreates Your Document Layout 



The OCR wizard renders the recognition process highly automatic, but "auto- 
matic" OCR should nothQ confused with autoformatting! "Autoformatting" means 
that Readiris recreates a facsimile copy of the scanned document: the word, 
paragraph and page formatting of your original document are applied. 

Similar typefaces (serif and sans serif, proportional and fixed, normal and 
condensed) are used as in the source document, the point sizes and typestyles 
(bold, italic, underlined, superscript and subscript) are maintained across the rec- 
ognition. The tabs and the alignment (left, centered, right and justified) of each 
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text block are recreated. So are the bulleted and numbered lists. Any e-mail 
adresses and URLs of web pages get detected and recreated as hyperlinks in the 
output. The placement of columns, text blocks and graphics follows your original 
document. 

In other words, Readiris allows you to archive a true copy of your documents, 
be it a editable and compact text file instead of a scanned image! 

All this implies that the sorting of windows only partially applies when 
"autoformatting" is used: you can include and exclude zones, but any re-ordering 
of zones is simply ignored! 

Here's an example of how it works. To get acquainted with this feature, open 
the image AUTOFORM.JPG which is found in your Readiris folder. 



^ Readiris 



File Edit Settings View Process Learn Register Help 




Autoformatting 

The aim of "autoformatting" is to recreate a faesimile copy of 
the original document. 



rr^he OCk pjocess does more than just 
I rtcognw yoM text, it can Tawmai \l for 
A. yau too! 

Ill a way, text ns^mtion is becoming 
more and more page recognition or docuinefU 

recogntlion... 

Whether your OCR software reformats 
the recognized text or ncrt is up to the user. 
Vou can perflomi OCR because you just need 
the text, in which case yau will edit mi 
formal it you-rself, and ycrn csn, recreate thg 
gPMft?<|gttinwn^ including ils formatting. 



The various lev* Is af 
formatting ara : erftating body 
text, retaining the wora and 
paragraph formatting and 
creating a facsiTnile copy. 

Creating body text means no 
formatting is applied; you 
get a continuous, running 
text. All formatting, if 
any, is done afterwards by 
the user. 

If y)u retain the word and paragraph Jbrmatling^ 
t)i« font type, size and t>peslyk are maintaioed 
serosa ihe rect^ition. The Jastincaiioa of the 
par^giaphs is also detected, However no gr^tucs 
are captured And the oolutnits aren't lectciited - 
the paragraph just roilaw eacb Oither etc. 

"AiHofonnalting" recreates a facsimile 
copy of the original dwument; the tm 
blocks, graphics and tabJes arc recreated in 
the same place and the word atid 
paragraph forniattine are mainEaincd 
across the recognition. 
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A* a KmlU you gel 3 (me copy of your source 
document, be it a compact and editable text 
file, no longer a scann^ image oF your 
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Click the "Format" button on the main toolbar and choose to send the OCR 
result to Microsoft Word or select the RTF (Rich Text Format) or Word (DOC) 
ft)rmat. Secondly, select "Recreate Source Document" as layout option. (The 
option "Merge Lines into Paragraphs" is enabled by default to apply wordwrap 
within the paragraphs.) 
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l_ayi_iiji. 

Create body text 

C Retain word and paragraph formatting 

(* Recreate source document 

P Use columns instead of frames 
P Insert columns breaks 







Whether layout reconstruction is available depends on the selected output 
mode. Some "poor" formats generating "plain" text such as Text (ANSI), MS- 
DOS Text (ASCII) etc. do /70/' support advanced formatting codes and therefore 
cannot offer autoformatting. The Adobe Acrobat PDF format on the other hand 
was designed to copy the look of your documents: PDF documents by nature 
imply autoformatting. 

When the recognized text is opened using a wordprocessor, the text looks like 
this without ^/7j intervention by the user. 
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H autoform - Microsoft Word 






□0® 


: File Edit View Insert Format Tools Table Window Help 






Type a question for help -r x 


: J J id J J.!,^' ! -9 4Jaead j 


1 : Times New Roman 


- 5 


i B I u ! S B ! |E :E -ip 1 ffl ' H 



Autoformatting 

Tlie aim of "autofoiiiiattiaig" is to recreate a facsimile copy of the 
oiisiiial docmnent. 



T 



he OCR process does more than just 
recognize your text, it can format it for 
you too ! 



In a way, text recognition is becoming 
more and more page recognition or document 
recognition. .. 

"Whether your OCR software reformats 
the recognized text or not is up to the user. 
You can perform OCR because you just need 
the text, in which case you will edit and format 
it yourself, and you can recreate the source 
document , including its formatting. 



The various levels of 
formatting are: creating body 
text, retaining the word and 
paragraph formatting and 
creating a facsimile copy. 

Creating body text means no 
formatting is applied: you 
get a continuous, running 
text. All formatting, if any, 
is done afterwards by the 
user . 

If you retam the word and paragraph formatting, 
the font type, size and type style are maintained 
across the recognition. The justification of the 
paragraphs is also detected. However, no graphics 
are captured and the columns aren't recreated the 
paragraph just follow each other etc. 



Page 1 5ec 1 
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To see the effect correctly, you need to enable the "WYSIWIG" mode of 
your wordprocessor, mostly called "page layout" mode. However, if you send the 
recognized document directly to Microsoft Word, the page or print layout view is 
activated automatically! 



2-9/ 



User 's Guide 



View 






Normal 
Web Layout 


p 


Print Layout 



In short, Readiris not only recognizes your texts, but can format them for you 
as well. OCR isn't just text recognition anymore, it is becoming more and more 
page or document recognition as well! 

Columns Please^ Not Frames! 



The formatting option "Use Columns instead of Frames" determines how the 
"autoformatting" gets done: the text blocks, tables and graphics can either be 
stored in frames or in editable columns. 





C" Create body text 

C Retain word and paragraph formatting 

(* Recreate source document 

R Use columns instead of frames 
P Insert columns breaks 







"Frames" are separate containers for text used to position several blocks of 
text, graphics and tables on a page. With columns, the text flows naturally from 
one column to the next, and columnized texts are much easier to edit. 

We now assume that real columns do occur on the scanned document: when 
the system is unable to detect columns in the source document, this formatting 
mode uses frames anyway as a "fallback" position! 

You can make good use of the image COLUMNS.JPG in the Readiris folder 
if you want to try it. 



^ columns Microsoft Word l^®® 
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Number of columns: ^ tA 



"Width and spacing 
Col #; Width; 



□ Line between 
Preview 



6,39 cm 0 0j34cm 



6,39 cm * 



I I Equal column width 

Apply to ; Whole document 



Page 1 



luiii - i tiiDught, it i dom raose some money 
this way, at least I'll get some attention, and let 
everyone know that I'm boking for money for 
the post-production.'" 

The busking episode is just the latest segment m 
a fiLmmaMng saga that has stretched over the 
better part of nearly four years . For Stark and his 
crew, making Tiii hasn't just been a krk to goof 
around with on the weekends. It's been a 
constant mission. 

Stark met his chief collaborator on the film, 
writer/ director Lance Peverley, while both 



)on QuiHote story 
;a's point of view, 
1 the role of the 
;nzer, an out-of- 
quickly cast Tom 
ir on The X-Filss 
Lbetteras Frohike 
:i-iti the lead, and 
eir first day of 
gly endless shoot, 
sandwiched in 



iith at all, so we 
se money for one 
to giet the sec- 
"When 3?tiu're 
■you can put your 



pianrang one aay ol limung, 
best effort into high production values-I wanted 
it to look hke a highly polished fihn that told 



^ <_ 

Sec 1 



1/1 



At lj2cm Ln 1 



"We had no money to start 
with at all. so we thouglit if w e 
could at least raise money for 
one day, that w ould give us 
leverage to get the second day 

REC TRK EXT OVR French (Fra 



The option "Insert Column Breaks" refines the recreation of columns: it deter- 
mines whether you insert "hard" column breaks at the end of each column or not. 
With column breaks, any text you edit, add or remove remains inside its column; 
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no text ever flows automatically across a column break. All text that follows a 
column break is moved to the top of the next column! 

Enable this option when you want to maintain column breaks where these 
were detected in the recognized document - whatever text editing gets done after 
the OCR. In newspapers and magazines, the various columns on a page often 
correspond to different article "threads". Having text flow from one column to 
the next "on the sly", covertly may not be a good idea! 

Disable this option when you have columnized body text: you'll ensure the 
natural flow of the text from one column to the next. 

Text Formatting^ Part 2 



The other layout options are "Create Body Text" and "Retain Word and Para- 
graph Formatting". 

As the icon on the right side illustrates, creating body text means you create 
a non-formatted, "running" text. The text will be captured, but its formatting is 
entirely ignored. Use this option when you just need to recapture a text but not its 
layout. 





(* Create body text 

C Retain word and paragraph formatting 

C Recreate source document 

F LI...- instead of frames 

W Insert columns breaks 







Body text is also what you get when you quickly recognize a text zone by 
right-clicking it and selecting the command "Copy as Text": when the recognition 
is done, you'll paste body text into your text application. 

The option "Retain Word and Paragraph Formatting" represents the middle 
road: the word formatting - font type, point size and typestyle - is retained 
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across the recognition, and so is the paragraph formatting - the tabs and the 
ahgnment. 

Don't confiise this formatting option with "fiill" autoformatting: this option just 
puts one paragraph after the other, it does not recreate columns or copy the 
relative position of the various zones. 

Exporting Text Several Times 



Actually, you can export the OCR results several times without repeating the 
recognition! Change the text format and the formatting options under the "For- 
mat" button and click the button "Recognize" again. No OCR is executed this 
time - unless you defined new windows or modified existing ones! Otherwise 
Readiris just reformats the OCR results and saves them in the new text format or 
sends them to the target application you've just selected. 




The same goes for any other element you change: when you add a page to 
your OCR job, only that page will be recognized. When you create a new text 
zone on any page, only that zone will be recognized before the results get ex- 
ported. 

You could for instance recognize a 10 page document and save it in a Word 
file. Then you quickly scan the abstract found on the cover page and send it by e- 
mail to an impatient colleague to finally scan the appendix - a table - and save all 
results in an HTML file to be posted on your company's web site. 

Saving Graphics Separately 



In our example, the graphic was included in the recognized text; whether this 
is the case depends on the formatting option "Include Graphics". Whether it is 
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possible to save graphics inside the text again depends on the output mode. "Poor" 
text formats such as Text (ANSI) etc. don't store graphics! 

- □ ptions 1 

P Merge lines into paragraphs 

P Include graphics 

Still, with Readiris, you can save graphics without performing text recognition. 
As Readiris generates black-and-white, greyscale and color images, you can 
capture lineart graphics and photos. 

How? Draw a graphic zone around the illustrations, cartoons etc. you need. 
Creating graphic windows manually is done in the same way as drawing text and 
table windows, simply select the "Graphic Window" tool now. 




Next, choose the command "Save Graphics" under the "File" menu. 




You are prompted to specify a filename. Determine which graphic file format 
you will use. Select a format that's supported by your paint or photo retouching 
software. The JPEG, TIFF and Paintbrush (PCX) formats are supported. Enable 
the option "Greyscale/Color" to save the graphic as a color or greyscale graphic. 
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Save Graphics... 


0® 


Save in: |Q Mv Documents jj ^ rf" lOil- 




fr^Mv Music 
(^My Pictures 

IHmv Videos | 






File name: j 


Save 1 






Saveastvpe: jTIFFT-tif) 


Cancel 


IjPEG r.jpg) 1 





To send a graphic to the cUpboard rather than save an image file, right-cUck 
your mouse over a graphic window and select the command "Copy as Graphic": 
the graphic zone under the mouse pointer is ready to be pasted! 
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source document , including its formatting. 




Reading Faxes and Deferred Recognition 



Saving images as image files opens another possibility: you can save the full 
page and perform deferred OCR on it later on. That's what we did with the 
prescanned images of our tutorials. 

Simply scan the document. Select the command "Save Full Page as Image" 
under the "File" menu to save a single page. You'll again be prompted to save the 
entire page as TIFF or Paintbrush (PCX) file. 




Select the command "Save All Pages as Image" to save a multipage docu- 
ment. A single file format is available here: multipage TIFF. 

You can now select the disk as image source and open the image file with the 
"Open" button (or with the corresponding command under the "Process" menu). 
(If you use the "Open" command under the "File" menu, you don't even have to 
update the image source.) 

As color, greyscale and black-and-white images are supported on an equal 
basis, Readiris opens Adobe Acrobat PDF documents, JPEG images. Paintbrush 
(PCX) images, DCX fax images (a multipage version of the Paintbrush format), 
PNG images, TIFF images (uncompressed, LZW, PackBits, Group 3 and Group 
4 compressed), multipage TIFF images and Windows bitmaps (BMP). 

This capability is particularly useful to convert your faxes into editable text 
files! Readiris uses extra intelligence when it comes to reading faxes: the soft- 
ware detects the typical fax resolutions - 100 x 200 dpi ("normal quality"), 200 x 
200 dpi ("fine quality") and 200 x 400 dpi ("superfine quality") - and "prepro- 
cesses" these images automatically to ensure optimal OCR results. 

Nevertheless, it's still a good idea to ask your correspondents to send faxes 
with the "fine" quality - those faxes will yield better OCR results. 

Don't forget that you can right-click on images in the Windows Explorer and 
select the command "Recognize" from the "Context" menu to open images! Al- 
ternatively, you can use "drag and drop": drop image files from the Windows 
Explorer onto the image zone or icon of Readiris and they are promptly opened. 

Recognizing Tables 



So far, we've recognized texts and faxes and we've saved graphics. Let's 
process a table now. Take a table of figures and scan it, or open the sample image 
TABLES.JPG in your Readiris folder. 

Actually, the image TABLES.JPG contains two tables, and that's no coinci- 
dence! The page analysis zones them as table windows, and Readiris will recon- 
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struct them for you by recreating the tables cell by cell in your spreadsheet or by 
inserting a table object inside your wordprocessor files. 

Let's explore the different solutions, starting with the "gridded" or "framed" 
table - it has borders around the cells. 



File Edit Settings View Process Learn Register 



L In 

I Options 



it? 



— « 

L English 

L Learn 



Reading Tables 



Readiris recognises tabular data and recreates them Cell hy cell in worksheets 
or as table objects inside wordprocc^or files. 

To insert Lables as table objects, you mtist rerain the word and paragraph Ibrmatting or rccnciilt: the 
ioiirce docQiiient; see the "Format" button on the main toolbar. 

Thc page analysis dclcctji "gridded" and "LLngridded" tables. "Gridded" or "framed" tables have 
UirJcr.s {irtmnd iHq. tells - tis dtitis tljc example below, The borders of the table celis get recreated. 
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Tested on 333 MHz Pentium TI PC with 64 MB RAM and 4 GB SCSI HD 



"Ungriddcd" tables don't have any borders around the cells. When Ihc columm of ungiidded 
tables are too widely spaced, the pa^ analysis may not detect a table window to avoid cotifusion 
with wlumnij^cd ie,Jtt blocks. 

When your tables exclusively contain aumeric characters, enable the nnmeric rcadin<j mode with 
the "Lafiguage" button on the main toolbar for incmased aticnracy. 



Run the recognition with the layout option "Retain Word and Paragraph For- 
matting" or "Recreate Source Document" enabled and the table gets recreated. 
Open your wordprocessor to have a look at the result: the cells and the borders 
were recreated by Readiris one by one! (You could obviously have included the 
text paragraphs in the text file as well.) 
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I 



Now the "ungridded" example - it has no borders around the cells. Note that 
the page analysis nevertheless detects the table! 
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I File Edit Settings View Process Learn Register Help 
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**L"nEricld«l" tables don't have any bordcre iiround ihc Lclk. When the columns of ungridded 
lablchi arc Uhj widely spaced^ the page analysis may not detect a table window to avoid confusion 
with colummi/icd text blocks, 



When your tab!cs cxclusiivcly contain numeric characters, enable the numeric reading mode with 
cEie "Language" button on the main toolbar for increased accuracy. 
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l'"ma!!y, you can send your tables of figures directiy to Microsoft Excel by selecting the spreadsheet 
as target application - inefef to the "Fonnat" button on the main toolbar. 



Copyright hnage Retogtiiuon Integrated iSysHims 
Web site: hitp://www.ifisiiiik,coni 



For optimal OCR accuracy, you should limit recognition to the numeric sym- 
bols with the "Language" button. (The numeric mode is not strictly numeric, it 
includes the symbols 0 to 9, +, /, %, , (comma), . (dot), (, ), -, =, $, £, ¥ and the 
€ symbol.) 
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As you can only do this when the table doesn't contain any alphabetic symbols 
- otherwise the text portions won't be recognized correctly - we can activate the 
numeric mode now but couldn't do it for the first table. 

This time, we will send the OCR result directly to the spreadsheet Microsoft 
Excel, so we select Excel as target application under the "Format" button. 
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The spreadsheet is started up automatically and the result looks like this: the 
typical table structure with rows and columns is recreated, and you are immedi- 
ately ready to process the data. 
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You may come across "ungridded" tables the page analysis does not detect as 
table zones because the columns are too widely spaced - Readiris tries to avoid 
confusion with columnized text blocks. To create a table window manually, click 
on the "Table Window" tool in the image toolbar and proceed as usual; the button's 
tooltip again indicates the number of table windows. 

[Praw table window: o| 



Getting On-line Help 



This concludes our overview of Readiris. Some last-minute information may 
not be included in this manual. We thus recommend you to consult the on-line 
help system for additional information on Readiris. 

Go to the "Help" menu to do so. The command "Help Topics" and its shortcut 
key F 1 allow you to navigate through the many help topics. 





2- 104 



§? Rcddiris help 



Hide Back Forward Home 



Print Options 



Contents Index Search 



Hlwelconne to iihe Readiris hieip 



^ Introducing OCR 

^ Recognizing Documents 

^ How to.,.? 

^ Reference Infornnation 

^ Software Versions and Options 

^ Product Registration 

^ Product Support 

%t I.R.I.S. 




Welcome to Readiris^^^ Help.. 

• Use on-line help to learn more about Readiris. 

• Quickly find answers to questions. 

• Connect to the I.R.I.S. web site for latest tips 
and product updates, 

©2004 Copyright I,R,I,S. All rights reserved 



The other commands of the "Help" menu tell you how to get product support, 
how to contact I.R.I.S., give direct access to the I.R.I.S. home page etc. 



